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Abstract 


Advances  in  hardware  and  software  technology  have  led  to  the  development  of 
automated  research  systems.  The  Air  Force  Research  Laboratory  (AFRL)  utilizes  the 
Adaptive  Rapid  Experimentation  and  Spectroscopy  (ARES)  system  to  synthesize  carbon 
nanotubes.  The  AFRL  researchers  are  investigating  different  approaches  that  can  improve 
the  experimental  capability  of  ARES  from  automation  to  autonomy.  Carbon  nanotubes 
are  discussed  as  an  emerging  technology  for  many  applications,  but  AFRL  has  yet  to 
discover  what  factors  optimize  the  nanotube  initial  growth  rate.  In  this  study, 
experimental  planning  software  was  written  for  ARES  that  autonomously  designs  and 
executes  experiments  based  on  the  Response  Surface  Methodology  (RSM).  RSM  is  a 
statistically-based  method  of  sequentially  planning  experiments  to  find  the  optimal 
settings  of  independent  variables  that  optimize  the  value  of  a  dependent  response 
variable.  This  thesis  discusses  the  development  and  early  success  of  the  initial  version  of 
the  planning  software.  As  this  is  a  relatively  new  research  area  spurred  by  recent 
advancements  in  materials  research  technology,  detailed  discussion  is  also  provided  on 
the  unique  challenges  of  creating  autonomous  research  robots. 
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AUTOMONOUS  EXPERIMENTATION  OF  CARBON  NANOTUBE 


GROWTH  USING  RESPONSE  SURFACE  METHODS 

I.  Introduction 

1.1  Autonomous  Systems 

The  Air  Force  (AF)  Chief  Scientist’s  2010-2030  science  and  technology  vision, 
“Technology  Horizons”,  states  “a  key  finding  is  the  need,  opportunity,  and  potential  to 
dramatically  advance  technologies  that  can  allow  the  Air  Force  to  gain  capability 
increases,  manpower  efficiencies,  and  cost  reductions  through  far  greater  use  of 
autonomous  systems  in  essentially  all  aspects  of  Air  Force”  (Chief  Scientist,  2010:130). 
In  response  to  this  vision,  the  Air  Force  Research  Laboratory  (AFRL)  released  a  strategy 
to  develop  and  improve  autonomous  systems.  The  terms  automation  and  autonomy  are 
often  used  synonymously,  but  AFRL  developed  clear  definitions  that  separate  these  two 
levels  of  system  operability.  An  automated  system  can  function  with  little  or  no  human 
involvement,  but  is  limited  to  performing  specific  actions  from  the  initial  system  design 
(AFRL,  2013).  An  autonomous  system  includes  “a  set  of  intelligence-based  capabilities 
that  allow  it  respond  to  situations  that  were  not  pre-programmed  or  anticipated  in  the 
design”  (AFRL,  2013).  Automation  is  only  a  fraction  of  autonomy,  so  increasing  the 
level  of  autonomy  in  automated  systems  should  improve  manpower  efficiency  and 
reduce  costs  as  described  by  the  AF  Chief  Scientist. 

In  2012,  the  Defense  Science  Board  (DSB)  also  released  a  report,  “The  Role  of 
Autonomy  in  Department  of  Defense  Systems”,  which  discusses  many  different 
applications  and  benefits  of  autonomy  (DSB,  2013).  However,  the  DSB  report  fails  to 
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mention  the  use  of  autonomy  in  experimental  systems.  Automated  experimental  systems 
can  perform  multiple  experiments  based  on  the  inputs  of  the  researcher.  After  a  set  of 
experiments,  the  researcher  must  analyze  the  results  and  then  determine  another  set  of 
experiments  to  progress  towards  a  certain  objective.  The  addition  of  autonomy  to  an 
experimental  system  can  eliminate  the  need  for  frequent  intervention  by  the  researcher. 

Experimental  autonomy  leaves  more  time  for  the  researcher  to  focus  on  subject- 
matter  research  rather  than  the  design  and  execution  of  experiments.  Scientific 
researchers  may  not  have  a  strong  familiarity  with  Design  of  Experiments  (DOE),  so 
autonomous  software  can  reduce  costs  by  planning  fewer  experiments  to  achieve  the 
same  or  better  results. 

Any  autonomous  system  can  fail  if  the  user’s  trust  in  the  software  is  lost.  Trust  is 
established  through  successful  results,  as  well  as,  an  effective  interface  that 
communicates  progress  and  results  to  the  user.  A  quality  understanding  of  the 
methodology  and  techniques  applied  by  the  software  helps  to  foster  trust. 

Additionally,  the  user’s  patience  is  a  major  factor  in  autonomous  experimental 
systems.  If  the  user  does  not  trust  the  autonomous  experimental  system,  the  process  may 
not  be  allowed  to  reach  its  end  state.  With  a  loss  of  patience,  the  user  might  decide  to 
terminate  the  experimental  process  before  the  software  is  able  to  reach  a  significant 
conclusion. 

To  promote  trust  and  ensure  that  the  user  remains  patient  with  the  software, 
several  objectives  can  be  accomplished.  The  following  objectives  are  desired  for 
autonomous  DOE  software  to  operate  effectively  over  a  long  period  of  time: 
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1.  The  user  interface  is  understandable  and  easy  to  use. 

2.  Decisions  made  by  the  program  are  virtually  equivalent  to  what  a  human  expert 
in  DOE  would  decide  in  the  same  situation. 

3.  Display  the  status  of  each  step  of  the  experimental  process  to  provide  awareness 
of  important  decisions  made  by  the  software. 

4.  Optimize  the  desired  response  variable  within  the  experimental  process,  so  the 
user  is  not  regularly  required  to  select  additional  inputs. 

5.  Plan  experiments  within  the  feasible  region  of  execution  to  ensure  system 
operates  properly. 

Accomplishment  of  these  five  objectives  will  facilitate  the  usefulness  and  longevity  of 
the  autonomous  DOE  software. 

1.2  Carbon  Nanotube  Growth  Research 

In  the  17th  century,  Muslim  weapon  forgers  designed  legendary  weapons  known 
as  Damascus  Sabers.  During  the  Crusades,  these  sabers  were  highly  effective  against 
European  warriors,  because  they  were  supremely  sharp,  strong,  and  flexible.  In  2006, 
scientists  discovered  that  the  secret  behind  the  Damascus  Saber’s  superiority  was  that 
the  weapon  forgers  had  unintentionally  created  carbon  nanotubes  within  the  steel 
(Fountain,  2006).  Scientists  today  can  intentionally  develop  carbon  nanotubes,  but  there 
are  still  many  challenges  in  production  to  overcome.  The  tensile  strength  and  flexibility 
of  carbon  nanotubes  can  lead  to  many  potential  applications  and  an  important  role  in  the 
future  of  nanotechnology.  Some  of  potential  applications  include  a  space  elevator, 
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fighting  cancer  cells,  replacing  Kevlar,  and  solar  cells.  Before  any  of  that  is  possible, 
scientists  must  identify  what  factors  significantly  affect  carbon  nanotube  growth. 

AFRL  is  one  of  many  parties  interested  in  carbon  nanotube  experimentation.  The 
Soft  Matter  Materials  Branch  (RXAS)  at  AFRL  acquired  a  machine,  the  Adaptive  Rapid 
Experimentation  and  Spectroscopy  (ARES)  system,  that  can  eventually  execute  up  to 
one  hundred  carbon  nanotube  experiments  in  a  single  day  (Nikolaev  et  al.,  2015).  ARES 
applies  laser-induced  Chemical  Vapor  Deposition  (CVD)  to  synthesize  carbon 
nanotubes.  CVD  synthesizes  carbon  nanotubes  with  three  main  components:  a  heat 
source,  a  hydrocarbon  gas  mixture,  and  a  catalyst.  The  researchers  can  adjust  the  settings 
of  these  components  and  several  other  factors  to  produce  carbon  nanotubes  with 
different  growth  characteristics  and  properties.  They  would  like  to  characterize  nanotube 
production  as  a  function  of  these  factors.  However,  the  researchers  are  not  deeply 
familiar  with  DOE  and  the  planning  of  rigorous  experiments  to  reach  their  research 
goals.  The  AFRL/RXAS  researchers  desire  an  experiment  planner  computer  program 
that  autonomously  characterizes  and  optimizes  the  initial  growth  rate  of  carbon 
nanotubes. 

The  ARES  system  includes  several  other  experiment  planner  options  that  apply 

machine  learning  techniques.  The  RSM  planner  software  is  the  first  that  includes  an 

actual  DOE-based  approach  to  optimize  a  response  variable.  Machine  learning 

techniques  are  reliant  on  the  database  of  previous  experimental  results.  The  current 

databases  were  not  obtained  using  DOE  principles.  RSM  is  capable  of  exploring  the 

entire  region  of  operability  to  find  potential  solutions.  Machine  learning  techniques  are 

computationally  rigorous  and  typically  difficult  for  novice  users  to  understand,  while 
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RSM  requires  little  computation  and  is  fairly  simple  to  understand.  The  researchers 
expect  the  RSM  planner  to  operate  conveniently  and  to  obtain  significant  findings  faster 
than  the  other  options,  because  of  its  ability  to  optimize  and  apply  efficient  experimental 
designs. 

1.3  Response  Surface  Method 

Response  Surface  Methodology  (RSM)  is  a  procedure  of  statistical  techniques 
that  is  useful  when  modeling  a  problem  that  includes  many  factors  influencing  the 
response  of  interest  (Montgomery,  2008:478).  RSM  is  well-suited  to  work  with  carbon 
nanotube  experimentation  due  to  the  large  number  of  factors  and  because  the  researchers 
believe  the  response  surface  is  highly  nonlinear  (Nikolaev  et  al.,  2015).  RSM  can  be 
applied  to  this  problem,  because  all  of  the  variables  are  continuous  and  response 
optimization  is  desired.  A  computer  program  is  designed  and  coded  for  ARES  to 
autonomously  plan  experiments  by  following  the  RSM  approach.  This  program  is  coded 
in  the  C#  language  for  compatibility  with  the  ARES  software.  The  program  is  capable  of 
experimenting  with  up  to  six  different  factors  with  the  goal  of  maximizing  the  initial 
growth  rate  of  carbon  nanotubes. 

Before  the  RSM  process  starts,  the  researchers  can  adjust  several  input  categories 
to  include  the  initial  search  location,  factor  level  sizes,  and  factor  level  boundaries.  The 
program  plans  the  appropriate  experiments  based  on  the  initial  inputs  and  the  current 
stage  of  the  RSM  process.  The  process  continues  until  the  initial  growth  rate  stops 
increasing  in  value.  A  local  and  possibly  global  solution  is  found  when  the  response 
surface  appears  in  the  canonical  form  of  a  local  maximum.  Once  this  solution  is 


5 


obtained,  the  program  ceases  to  plan  experiments  and  reports  the  optimal  setting  of  each 
factor  and  the  maximum  response  value.  Maximizing  the  initial  growth  rate  is  the  first 
step  towards  maximizing  carbon  nanotube  production.  The  results  of  the  RSM  process 
should  provide  insight  into  what  factors  significantly  affect  the  initial  growth  rate. 

1.4  Limitations  and  Scope 

The  program  is  designed  to  operate  with  no  more  than  six  factors  and  maximizes 
one  response  variable.  These  factors  are  the  only  variables  of  current  interest  to  the 
researchers.  Of  the  six  factors,  half  are  continuous  process  variables  and  the  other  half 
are  mixture  variables.  Future  deviations  to  the  number  and  type  of  variables  will  require 
a  major  adjustment  to  the  program’s  source  code.  The  researchers  are  also  interested  in 
maximizing  another  continuous  response  variable,  the  catalyst  lifetime.  The  current 
settings  in  ARES  did  not  allow  enough  time  during  each  experiment  to  capture  results  of 
this  variable.  Due  to  the  large  of  percentage  of  catalyst  lifetime  results  that  cannot  be 
obtained,  this  RSM  process  only  focuses  on  the  initial  growth  rate  response.  Several 
other  categorical  responses  are  important  to  the  researcher,  such  as  whether  a 
synthesized  carbon  nanotube  is  single  or  multiple  walled.  RSM  is  only  appropriate  for 
continuous  response  variables,  so  categorical  or  binary  responses  are  not  included  in  this 
study. 

1.5  Research  Objectives 

The  following  objectives  are  defined  for  this  thesis. 

1.  Determine  the  most  suitable  experimental  designs  to  model  six  factors  with  a 
specialization  to  include  three  mixture  variables. 
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2.  Incorporate  the  experimental  designs  into  an  RSM  process  to  maximize  the 
initial  growth  rate  in  a  quick  and  efficient  manner. 

3.  Create  a  user  interface  that  fosters  trust  and  awareness  with  the  researcher. 

4.  Automate  the  decision  process  concerning  which  set  of  experiments  are  selected 
for  each  stage  of  the  RSM  process. 

5.  Determine  the  challenges  of  autonomous  experimentation. 

1.6  Thesis  Overview 

The  remainder  of  this  document  is  organized  as  follows.  This  first  chapter 
introduces  the  topic  of  interest  and  the  research  objectives.  The  second  chapter  provides 
an  in-depth  review  on  important  background  information  and  the  analytical  techniques 
applied  in  this  thesis.  The  third  chapter  contains  a  detailed  description  of  the 
methodology  used  to  accomplish  the  research  objectives.  The  fourth  chapter  includes 
and  describes  the  results  from  the  implementation  of  the  RSM  experiment  planner. 
Finally,  the  fifth  chapter  will  discuss  analytical  conclusions  and  recommendations  for 
future  research. 
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II.  Background 


This  chapter  is  a  comprehensive  overview  of  the  subject  matter  and  the  analytical 
techniques  that  are  applied  in  this  study.  The  first  section  is  an  overview  of  Response 
Surface  Methodology  (RSM).  The  second  section  explains  carbon  nanotube  growth 
factors  and  response  variables.  The  third  section  discusses  important  considerations  for 
the  Adaptive  Rapid  Experimentation  and  Spectroscopy  (ARES)  system.  The  fourth 
section  discusses  the  advantages  and  disadvantages  of  the  current  ARES  experimental 
planners.  The  fifth  section  explains  the  analytical  techniques  applied  within  the  RSM 
process. 

2.1  Response  Surface  Methodology  Overview 

A  statistical  approach  to  experimental  design  helps  to  draw  meaningful 
conclusions  from  data  (Montgomery,  2008:11).  It  is  difficult  to  understand  the  true 
relationships  between  the  inputs  and  outputs  of  a  system  without  a  structured 
experimental  design.  In  the  1920s  and  1930s,  Sir  Ronald  A.  Fisher’s  statistical  analysis 
of  agricultural  data  led  to  the  three  primary  principles  of  DOE:  randomization,  blocking, 
and  replication  (Montgomery,  2008:21).  His  later  work  led  to  factorial  designs  and 
Analysis  of  Variance  (ANOVA)  which  are  also  cornerstones  of  DOE.  Applications  of 
statistical  design  continued  to  increase  during  the  industrial  era  with  the  advent  of  RSM 
by  Box  and  Wilson  in  1951  (Montgomery,  2008:21).  RSM  expands  on  DOE  to  solve 
problems  that  require  mapping  a  response  surface,  response  optimization,  and  selection 
of  optimal  operation  conditions  (Myers  et  al.,  2009:8).  RSM  is  limited  to  problems  that 
contain  continuous  independent  and  response  variables. 
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RSM  can  be  a  useful  technology  in  the  formulation  of  new  products  due  to  its 
capability  to  optimize  response  variables  and  find  desired  operating  conditions.  Most 
RSM  applications  are  sequential  procedures  that  can  involve  multiple  iterations  of 
experimental  designs  and  analysis  (Myers  et  al.,  2009:6).  This  experimental  procedure  is 
performed  within  the  feasible  operability  region  encompassed  by  the  independent 
variable  space  (Myers  et  al.,  2009:7).  In  problems  involving  more  than  three 
independent  variables,  mapping  the  response  surface  over  the  entire  region  of  operability 
is  usually  impractical  and  cumbersome.  Therefore,  the  sequential  procedure  consists  of 
smaller  regions  of  experimentation  and  statistical  models  that  are  utilized  to  search  the 
operability  region  for  the  optimal  response  location  (Myers  et  al.,  2009:8).  When  near 
the  optimal  response  location,  a  higher-order  statistical  model  is  applied  to  the  region  of 
experimentation  to  better  characterize  the  response  surface  and  discover  important 
results. 

2.2  Carbon  Nanotube  Growth  Factors  and  Response 
2.2.1  Growth  Factors 

The  Adaptive  Rapid  Experimentation  and  Spectroscopy  (ARES)  system 

performs  Chemical  Vapor  Deposition  (CVD)  to  synthesize  carbon  nanotubes  (Rao  et  al., 

2012).  CVD  requires  three  main  ingredients:  a  heat  source,  a  hydrocarbon  gas  mixture, 

and  a  metallic  catalyst  (Nikolaev  et  al.,  2015).  ARES  provides  heat  using  a  high- 

powered  laser.  The  hydrocarbon  gas  mixture  is  a  combination  of  up  to  three  different 

gases:  ethylene,  hydrogen,  and  argon.  The  catalyst  is  chemical  compound  typically 

consists  of  at  least  one  of  the  following  elements:  cobalt,  iron,  nickel,  or  aluminum 

(Nikolaev  et  al.,  2015).  The  ARES  system  can  also  control  total  pressure  and  water 
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concentration.  The  total  pressure  is  the  pressure  of  the  gas  chamber  and  is  completely 
independent  from  the  hydrocarbon  gas  mixture.  The  water  concentration  acts  as  a 
cooling  agent  for  the  catalyst  to  prolong  the  catalyst  lifetime  (Nikolaev  et  al.,  2015). 

The  catalyst  type  certainly  influences  carbon  nanotube  growth,  but  it  is  not  a 
factor  that  is  changeable  between  individual  experiments.  The  catalyst  type  is  held 
constant  for  each  replication  of  the  RSM  process.  For  laser  power,  the  independent 
variable  is  the  calibrated  temperature  in  Celsius  at  initial  growth.  The  calibrated 
temperature  does  slightly  differ  from  the  planned  temperature  on  a  regular  basis.  The 
total  pressure  is  adjusted  and  measured  in  units  of  torr.  Water  concentration  is  adjusted 
and  measured  in  parts  per  million  (ppm).  Temperature,  pressure,  and  water 
concentration  are  process  variables,  and  are  adjusted  independently  without  impacting 
the  setting  of  another  variable  (Cornell,  2011:354).  The  hydrocarbon  gas  mixture 
contains  three  different  mixture  variables.  Mixture  variables  differ  from  process 
variables,  because  the  proportion  of  one  of  the  components  must  decrease  if  another 
proportion  is  increased  (Smith,  2007:3).  Since  mixture  variables  are  not  independent 
factors,  different  techniques  are  used  to  include  them  with  process  variables  in  the  same 
experimental  design  and  linear  model.  The  mixture  variables  are  adjusted  in  ARES  as 
flow  rates  (standard  cubic  centimeters  per  minute),  but  are  measured  in  the  planner  as 
percentages  of  the  total  mixture. 

Engineering  (actual)  units  are  used  when  the  planner  is  providing  the  experiment 
settings  to  ARES.  However,  all  of  the  design  creation  and  analysis  is  executed  in  coded 
units.  Coded  units  enable  orthogonal  test  matrices  when  properly  designed  and  evenly 

scale  each  factor  to  make  coefficient  estimates  comparable  (Montgomery,  2008:290). 
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All  factor  levels  are  coded  to  fall  within  the  [-1,  1]  range;  minimum  settings  to  -1, 
maximum  settings  to  1.  The  coded  units  included  in  the  first-order  design  are  -1,0,  and 
1  for  the  low,  center,  and  high  factor  levels,  respectively. 

2.2.2  Response  Variable 

The  overall  goal  of  the  current  carbon  nanotube  research  is  to  maximize 
production.  Total  production  is  function  of  the  rate  of  production,  production 
sustainability,  and  time.  Carbon  nanotube  production  acts  in  a  similar  maimer.  ARES 
monitors  a  G-band  which  is  indicative  of  graphitic  carbon  (Rao  et  al.,  2012).  Basically, 
this  monitor  measures  carbon  nanotube  growth  over  time.  When  plotted  over  time,  two 
other  important  parameters  are:  the  hiitial  growth  rate  and  the  time  constant.  These  two 
parameters  are  shown  in  the  growth  curve  example  in  Figure  2.1  as  v  and  r, 
respectively. 


Figure  2. 1 .  Carbon  Nanotube  Growth  Curve  Example. 

(Reprinted  from  (Rao  et  al.,  2012)  with  permission  from  the  Nature  Publishing  Group) 
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The  initial  growth  rate  v  represents  the  estimated  initial  slope  of  the  predicted  G- 
band  line.  The  time  constant  r  is  also  referred  to  as  the  catalyst  lifetime,  because  it 
represents  when  the  growth  curve  levels  off.  The  growth  curve  is  estimated  by  the  self- 
exhausting  exponential  formula  (Rao  et  al.,  2012) 

G(t)  —  vt[1  —  exp  (—  t/ t)]  (1) 

where  t  is  time  in  seconds.  ARES  automates  the  creation  of  this  growth  curve  and  the 
parameter  estimation,  but  occasionally  the  researcher  must  manually  estimate  the 
parameters  when  there  is  an  issue  with  data  capture. 

When  the  maximum  G-band  is  reached  at  the  expiration  of  catalyst  lifetime, 
Equation  1  reduces  to  vr.  The  researchers  are  particularly  interested  in  maximizing  vr  in 
order  to  maximize  production.  However,  the  time  constant  is  difficult  to  obtain  during 
each  experiment,  as  the  result  can  extend  beyond  the  allotted  time  for  data  capture.  Due 
the  high  frequency  of  experiments  that  do  not  include  valid  results,  the  time  constant 
response  variable  is  not  included  in  the  RSM  process.  Fortunately,  maximizing  the 
initial  growth  rate  still  provides  a  positive  contribution  towards  maximizing  vr. 

2.3  ARES  System  Overview 

The  ARES  system  is  the  overarching  software  that  controls  and  monitors  the 
carbon  nanotube  experimentation  process  (Nikolaev  et  al.,  2015).  Experimental  planners 
are  an  additional  feature  of  ARES  but  are  only  instructions  to  perform  certain  sets  of 
experiments  or  mns.  These  experiment  plans  are  written  to  a  data  file  that  acts  as  the 
main  line  of  communication  between  ARES  and  the  planner.  The  planner  is  not  involved 
with  the  physical  experimentation  process  after  the  experiment  plan  is  submitted. 
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Actually,  the  planner  program  is  completely  closed  after  the  submitting  the  planner  data 
file.  RSM  is  a  continuous  process,  so  an  input/output  data  system  must  keep  track  of  the 
planner’s  status  and  necessary  data  after  each  set  of  planned  experiments. 

This  study  focuses  on  autonomy  in  experimental  planning,  but  there  are  many 
aspects  of  ARES  that  are  not  yet  autonomous.  The  researcher  must  perform  a  series  of 
various  calibrations  and  alignments  on  the  system  before  each  set  of  experiments.  The 
amount  of  experimentation  per  each  set  is  also  limited.  Experiments  are  performed  one 
patch  at  time  and  each  patch  contains  25  silicon  pillars  (Rao  et  al.,  2012).  Theoretically, 
up  to  25  experiments  can  execute  in  a  single  experiment  plan,  but  this  is  rarely  a  feasible 
option.  The  researcher  uses  a  camera  and  microscope  to  identify  which  pillars  are 
available  on  each  patch.  Typically,  many  pillars  are  unavailable  due  to  previous 
experimentation  or  are  scattered  with  debris  from  neighboring  pillars  that  overheat 
during  experimentation  (Nikolaev  et  al.,  2015).  To  accommodate  for  the  restriction  in 
the  number  of  experiments,  the  planner  should  continually  provide  experiment  sets  less 
than  about  15  runs.  Sets  of  only  a  few  experiments  are  also  not  preferred  due  to  the 
amount  of  setup  time  required. 

The  amount  of  experimentation  is  limited  on  each  patch,  so  it  very  likely  that 

results  originate  from  multiple  patches.  Analysis  of  previous  growth  data  revealed  a 

possible  change  in  the  initial  growth  rate  depending  on  the  patch  used  to  experiment 

(Nikolaev  et  al.,  2015).  The  blocking  principle  is  a  useful  technique  to  minimize  the 

potential  increase  in  variance  from  the  change  in  patch  (Montgomery,  2008: 13). 

Blocking  is  explained  in  further  detail  in  Section  2.5.2  for  the  first-order  design  and 

Section  2.5.5  for  the  second-order  design.  The  only  other  major  nuisance  factors  the 
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researchers  identified  were  laser  temperature  related.  The  ambient  temperature  in  the 
ARES  lab  and  the  laser  temperature  calibration  seem  to  affect  growth  results  (Nikolaev 
et  al.,  2015).  Occasionally,  the  calibrator  does  not  produce  actual  temperatures  that  are 
close  to  the  planned  temperatures.  Unfortunately,  incorporating  the  actual  calibrated 
laser  temperatures  into  the  planner’s  analysis  is  not  currently  an  option. 

2.4  Current  ARES  Experiment  Planners 

The  RSM  planner  is  incorporated  into  the  list  of  available  experimental  planners 
on  ARES.  The  two  most  prominent  planners  apply  machine  learning  techniques:  an 
artificial  neural  network  and  a  random  forest.  Below  are  some  of  the  disadvantages  of 
these  current  planners: 

1.  The  linear  dependency  of  mixture  variables  is  not  taken  into  consideration. 
Without  the  proper  mixture  experimental  design,  other  techniques  tend  to  ignore 
some  significant  blending  effects  due  to  the  linear  combinations  within  the  data. 

2.  Total  pressure  is  not  assessed  as  an  independent  factor  and  is  often  combined 
with  the  mixture  variables  to  create  partial  pressures. 

3.  The  planned  flow  rates  of  the  mixture  variables  are  not  maximized,  so  it  takes 
longer  to  prepare  each  experiment. 

4.  The  methods  are  based  on  previous  database  results.  Data  in  the  current  database 
was  not  obtained  using  experimental  design  principles.  Methods  based  on 
predicting  previous  results  often  struggle  extrapolating  these  results  to  new  areas 
of  application.  Also,  the  database  resets  whenever  the  type  of  catalyst  is  changed. 
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5.  The  underlying  analytical  techniques  are  highly  advanced,  but  are 
computationally  rigorous  and  difficult  for  the  ARES  users  to  understand. 

6.  Instead  of  maximizing  a  response,  these  planners  request  a  response  target  value 
from  the  user.  The  users  do  not  fully  understand  how  to  adjust  the  target  value 
and  the  degree  of  extrapolation  that  is  possible. 

The  current  planners  do  offer  several  advantages  that  the  RSM  planner  is  not  expected  to 
incorporate  in  this  study: 

1.  There  is  no  requirement  on  the  amount  of  experiments  planned  in  each  set. 
Traditional  experimental  designs  and  blocking  principles  are  limited  in  the 
minimum  amount  of  experiments  allowable  in  a  single  block. 

2.  The  current  planners  do  not  require  the  success  of  every  experiment.  The 
successful  experiments  are  added  to  the  database  and  the  unsuccessful 
experiments  do  not  affect  future  planning. 

3.  Although  the  experiments  were  not  designed  deliberately,  the  current  planners  do 
incorporate  previous  data  and  any  insights  that  may  exist  from  data  in  the  current 
database. 

2.5  Review  of  RSM  Techniques 

2.4.1  Conversion  of  Mixture  Variables  to  Ratio  Variables 
Traditional  RSM  techniques  are  designed  for  independent  variables.  The 
inclusion  of  mixture  variables  into  the  process  presents  the  decision  to  either  adjust 
RSM  techniques  to  accommodate  mixture  variables  or  convert  the  mixture  variables  to 
independent  variables.  There  are  many  different  techniques  to  include  both  mixture  and 
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process  variables  in  the  same  experimental  design  and  regression  model.  A  well-known 
technique  is  the  Cartesian  join  which  involves  appending  a  mixture  design  onto  each 
process  variable  experiment  run  (Smith,  2005:303).  This  approach  has  two  major 
downfalls;  many  experiments  are  required  for  even  the  most  modest  design  and  the 
search  path  is  limited  to  only  process  variables.  Also,  analysis  of  a  mixture  response 
surface  is  especially  difficult  within  an  automated  program. 

The  other  approach  is  to  transform  q  mixture  variables  into  q  —  1  independent 
variables.  The  ratio  variable  method,  presented  by  John  Cornell  in  “Experiments  with 
Mixtures”,  is  particular  easy  to  apply  in  an  automated  program  (Cornell,  201 1:305).  The 
only  requirement  is  that  each  ratio  has  a  component  that  is  included  in  the  other  ratios  in 
the  same  set  (Cornell,  2011:306).  In  this  study,  the  three  mixture  variables  are  converted 
into  two  ratio  variables  to  ensure  that  traditional  RSM  techniques  and  experimental 
designs  are  viable  throughout  the  entire  process.  All  coded  experimental  designs  now 
include  a  total  of  five  independent  variables.  Since  ethylene  is  in  every  experiment,  the 
percentage  of  this  gas  x2  is  the  denominator  of  both  ratios  to  eliminate  the  possibility  of 
ever  dividing  by  zero.  The  percentage  of  argon  and  hydrogen  gases  are  represented  as 

Xj  and  x3,  respectively.  The  first  ratio  rx  =  —  is  argon  per  ethylene  and  the  second  ratio 

x2 

r2  —  —  is  hydrogen  per  ethylene.  When  engineering  units  are  required,  the  following 

X2 

three  equations  convert  the  ratio  variables  back  into  mixture  percentages: 

Argon  (Ar):  x±  =  1+^  (2) 

Ethylene  (C2H4):  x2  =  1+ ^  (3) 
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Hydrogen  (H2):  x3  =  1+n+r2 


(4) 


These  three  equations  are  derived  from  the  two  ratio  formulas  and  the  mixture 
requirement  that  xx  +  x2  +  x3  —  1  . 

ARES  only  accepts  the  mixture  variables  in  the  flow  rate  form,  so  an  additional 
conversion  is  required  to  plan  experiments.  The  researchers  want  the  total  flow  rate  large 
to  accelerate  the  gas  insertion  process.  The  optimal  setting  of  flow  rates  is  calculated 
with  a  Linear  Program  (LP).  An  LP  model  is  a  set  of  mathematical  functions  where 
linearity  exists  in  both  the  objective  and  constraint  functions  (Hillier  and  Lieberman, 
2005:12).  The  LP  model  has  three  decision  variables  for  flow  rate,  ft.f2.f3,  that 
represent  argon,  ethylene,  and  hydrogen,  respectively.  The  constraints  shown  in 
Equations  6  through  8  each  include  one  slack  variable,  s1,s2,s3.  The  LP  objective 
function  is  shown  in  Equation  5.  The  constraints  of  the  LP  model  are  shown  in 
Equations  6  through  11. 


Maximize: 

Total  flow  =  A  +  f2  +  f3 

(5) 

Subject  to: 

ft  +  <  20 

(6) 

f2+  s2<  17.2 

(7) 

f3+  s3<  50.5 

(8) 

> 

1 

II 

o 

(9) 

o 

II 

1 

< 

(10) 

fi>  f2>  fs>  $!>  s2,  s3  >  0 

(11) 
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Equations  6  through  8  are  the  constraints  for  the  maximum  flow  rate  setting  for  argon, 
ethylene,  and  hydrogen,  respectively.  Equations  9  and  10  are  the  constraints  that  ensure 
the  two  ratio  variable  relationships  are  achieved.  Equation  1 1  constrains  all  decision 
variables  to  be  non-negative. 

A  simplified  technique  was  discovered  to  easily  solve  this  LP  within  the  C# 
environment.  The  LP  model  contains  six  decision  variables  and  five  constraints. 
Therefore,  when  ry  >  0  and  r2  >  0  ,  only  one  variable  is  nonbasic  or  set  equal  to  zero 
in  the  final  solution  (Hillier  and  Lieberman,  2005:1 10).  All  of  the  flow  rates  are  non¬ 
zero,  so  one  of  the  slack  variables  must  be  nonbasic.  The  nonbasic  slack  variable  is 
derived  through  a  minimum  ratio  analysis  from  the  simplification  of  the  simplex 

method.  When  argon  is  non-zero,  the  smallest  value  out  of  —  ,  17.2,  or  —  determines 

rl  *2 

if  sl5  s2,  or  s3  is  the  nonbasic  variable,  respectively.  When  rx  =  0  or  r2  =  0,  the 
unaffected  values  are  assessed  in  the  minimum  ratio  analysis.  After  the  nonbasic 
variable  is  determined,  the  flow  rate  of  the  variable  included  in  that  constraint  is  set  to 
the  right-hand-side  value.  The  other  two  flow  rates  are  easily  calculated  using  the  two 
ratio  constraint  equations. 

2.4.2  First  Order  Design 

Factorial  designs  are  particularly  useful  to  investigate  main  effects  and 
interactions  on  a  response  variable.  The  2k  factorial  design  is  important  for  two  major 
aspects  of  the  RSM  process:  to  generate  the  factor  estimates  required  in  the  path  of 
steepest  ascent  and  as  a  building  block  to  create  other  response  surface  designs  (Myers 
et  al.,  2009:73).  These  designs  are  labeled  2k  because  k  factors  are  considered  at  only 
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two  factor  levels.  A  full  2k  design  requires  2k  experiments,  but  this  amount  can  be 
reduced  depending  on  what  information  is  needed.  Fractional  factorial  designs  are  based 
on  the  sparsity  of  effects  principle  that  a  system  is  largely  impacted  by  main  effects  and 
low-order  interactions  rather  than  high-order  interactions  (Montgomery,  2008:321).  Due 
to  the  unlikelihood  of  higher-order  interactions,  a  fractional  factorial  design  combines 
main  effects  and  low-order  interactions  with  higher-order  interactions  (Montgomery, 
2008:322).  These  combined  effects  are  referred  to  as  aliases. 

Various  fractional  factorial  design  options  are  compared  using  the  design 
resolution  method.  A  resolution  V  design  ensures  that  no  main  effect  or  two-factor 
interaction  is  aliased  with  another  main  effect  or  two-factor  interaction  (Montgomery, 
2008:324).  A  resolution  V  design  produces  quality  main  effect  estimates  for  the  path  of 
steepest  ascent  and  can  augment  easily  to  a  second-order  design  (Myers  et  al., 

2009:298).  This  design  is  also  orthogonal  for  a  model  containing  main  effects  and  two- 
factor  interactions  which  ensures  linear  independence  and  minimizes  variance  (Myers  et 
al.,  2009:286).  Orthogonality  is  a  very  useful  property,  because  it  eliminates 
multicollinearity  in  the  regressor  variables  (Montgomery  et  al.,  2012:118). 
Multicollinearity  is  a  common  problem  in  data  that  is  not  collected  from  an  experimental 
design.  Multicollinearity  can  cause  inflated  or  erroneous  effect  estimates  due  to  the  near- 
linear  dependencies  within  the  data  (Montgomery  et  al.,  2012:285). 

For  a  design  with  five  factors,  a  half  fraction  25-1  produces  a  resolution  V 
design.  Therefore,  16  less  runs  are  required  to  generate  effect  estimates  of  a  similar 
quality.  In  addition  to  the  16  fractional  factorial  runs,  the  first-order  design  should 


include  center  point  runs  to  test  for  lack  of  fit  and  estimate  pure  error.  Typically,  at  least 
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three  center  point  runs  are  recommended  for  the  lack  of  fit  test  and  to  augment  to  a 
second-order  model  (Montgomery,  2008:288).  Four  center  point  runs  are  included  in  the 
first-order  design  to  allow  for  an  even  distribution  if  blocking  is  necessary.  The 
experiment  order  of  each  first-order  design  is  randomized.  Randomization  is  a  design 
technique  applied  to  minimize  the  effects  of  uncontrollable  nuisance  factors 
(Montgomery  2008:139).  The  full  25-1  design  is  usually  difficult  to  accommodate,  so  it 
is  split  into  two  separate  blocks  of  ten.  Blocking  is  a  critical  noise  reduction  technique 
that  ensures  that  any  nuisance  variability  is  not  wrongfully  distributed  to  certain  effect 
estimates  or  to  inflate  the  estimate  of  experimental  error  (Montgomery,  2008:313).  The 
block  designs  are  generated  using  the  two-factor  interaction  for  total  pressure  and  water 
concentration.  Through  discussion  with  the  researchers,  this  two-factor  interaction  was 
deemed  as  the  most  improbable  to  significantly  affect  the  response  (Nikolaev  et  al., 
2015).  Each  block  contains  two  center  point  runs  and  is  randomized  independently  from 
the  other  block. 

2.4.3  Lack  of  Fit  Test 

A  factorial  design  and  the  path  of  steepest  ascent  work  well  even  in  situations 
where  the  linearity  assumption  barely  holds  (Myers  et  al.,  2009:109).  However,  a  first- 
order  model  and  design  is  typically  inappropriate  when  quadratic  effects  are  significant. 
Pure  quadratic  error  is  identified  by  testing  whether  the  center  point  responses  fall  on  the 
same  linear  plane  as  the  factorial  response  results  (Myers  et  al.,  2009: 1 10).  A  pure 
quadratic  error  F-test  is  performed  on  the  results  of  each  first-order  design  to  determine 
the  adequacy  of  a  first-order  design.  The  sum  of  squares  for  pure  quadratic  curvature  is 
calculated  with  the  formula  (Myers  et  al.,  2009: 111) 
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where 


SS , 


Pure  quadratic 


nFnc(yF  -  yc)2 
nF  +  nc 


(12) 


nF  —  number  of  factorial  runs 
nc  —  number  of  center  runs 
yF  —  average  of  factorial  response  values 
yF  —  average  of  center  point  values 


The  F-statistic  is  the  ratio  of  the  sum  of  squares  for  pure  quadratic  error  with  the  mean 
square  for  pure  error.  The  mean  square  for  pure  error  formula  is  calculated  by  the 
formula  (Myers  et  al.,  2009:112) 


enter  runs(yi  Tc) 


MS 


Pure  error 


nc  —  1 


(13) 


where  y*  is  the  response  value  for  each  center  experiment.  Following  the  first-order 
design  experimentation,  the  F-statistic  is  compared  with  an  F-critical  value  associated 
some  confidence  level,  a,  such  as  0.05  or  95  percent.  Alternatively  and  used  here,  the 
probability  that  the  F-statistic  comes  from  the  hypothesized  central  F  distribution  is 
calculated  and  returned  as  the  test  p-value.  This  comparison  determines  if  a  second-order 
model  or  the  linear  search  process  is  the  next  course  of  action.  This  p-value  represents 
the  level  of  significant  required  to  reject  the  null  hypothesis  that  the  current  first-design 
is  linear  (Montgomery  2009:40).  The  p-value  is  generated  by  evaluating  the  F-statistic 
with  numerator  degrees  of  freedom  of  one  and  nc  —  1  for  the  denominator. 
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2.4.4  First-order  model  and  Path  of  Steepest  of  Ascent 


The  path  of  steepest  ascent  is  a  first-order  gradient -based  optimization  technique 
derived  from  the  main  effects  of  the  first-order  model  (Myers  et  al.,  2009:189).  The 
first-order  regression  model  is  obtained  by  the  formula 


where 


In  v  =  /?0  +  ftri  +  /?2r2  +  p3zx  +  /?4z2  +  /?5z3  +  s 


v  —  predicted  initial  growth  rate 
/?i  =  main  effect  of  factor  i 
ri>  r2  —  ratio  variables 
zlJz2,z3  =  process  variables 
e  =  random  error  component 


(14) 


Montgomery  et  al.  (2012)  list  five  major  assumptions  of  regression  analysis: 


1.  The  response  and  regressors  relationship  is  at  least  approximately  linear. 

2.  The  error  term  £  has  a  zero  mean. 

3.  The  error  term  £  has  a  constant  variance  er2. 

4.  The  errors  are  uncorrelated  (lacks  autocorrelation). 

5.  The  errors  are  normally  distributed. 

Linear  regression  analysis  of  prior  initial  growth  rate  data  revealed  an  issue  with 
constant  variance  of  the  model  residuals.  The  residuals  of  model  appeared  to  have  a 
funnel-like  shape  when  plotted  against  the  predicted  response  values  as  shown  in  Figure 
2.2. 
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Figure  2.2.  Prior  Data:  Residual  by  Predicted  Response  Plot 


A  natural  log  transformation  is  applied  to  the  response  to  stabilize  the  variance.  Figure 
2.3  shows  the  updated  residual  by  predicted  response  plot  with  a  variance  that  appears 
constant. 


Figure  2.3.  Prior  Data  with  Transformation:  Residual  by  Predicted  Response  Plot 
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The  path  of  steepest  ascent  is  generated  with  the  unit  gradient  approach  using  the 
formula 


where 

A Xj  =  search  gradient  of  factor  j 
Pi  =  main  effect  of  factor  j 

Each  A  Xj  is  multiplied  by  the  associated  factor  level  size  to  create  the  increments  that 
each  factor  changes  during  each  search  step.  This  path  of  steepest  ascent  does  not 
include  interaction  terms,  because  even  moderately  large  interaction  effects  cause  a 
slight  deviation  to  the  true  path  (Myers  et  al.,  2009:190).  Steepest  ascent  paths  using 
nonlinear  models  require  solving  a  series  of  constrained  nonlinear  optimizations. 
Additional  search  iterations  can  correct  large  deviations  to  the  search  path. 

2.4.5  Second-Order  Design  Augmentation  and  Model 

The  Central  Composite  Design  (CCD)  is  a  popular  second-order  design  that  is 

created  by  augmenting  the  current  first-order  design  with  axial  runs  (Myers  et  al., 

2009:298).  The  first-order  design  is  resolution  V  which  supports  the  estimation  of  main 

effects  and  two-factor  interaction  effects  in  the  CCD.  A  CCD  with  five  factors  requires 

ten  axial  point  mns  to  estimate  quadratic  effects.  Axial  points  are  spaced  at  a  certain 

distance  from  the  design  center  along  each  axis  for  a  single  factor  each  run  (Myers  et  al., 

2009:297).  This  axial  distance  is  important  in  determining  the  variance  properties  of  the 

CCD  and  for  orthogonal  blocking.  Rotatability  is  an  important  property  of  a  CCD, 
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because  it  provides  a  reasonably  stable  distribution  of  the  scaled  prediction  variance 
throughout  the  design  region  (Myers  et  al.,  2009:  305).  Rotatability  is  achieved  when  the 
axial  distance  is  a  —  V~F  with  F  as  the  number  of  factorial  runs  (Myers  et  al.,  2009:307). 

Blocking  is  another  important  consideration  when  the  CCD  is  created  by 
augmentation.  Orthogonal  blocking  the  axial  point  augmentation  with  the  previous  first- 
order  design  minimizes  the  impact  of  nuisance  factors  on  the  quadratic  effects  (Myers  et 
al.,  2009:325).  To  achieve  orthogonal  blocking,  the  axial  distance  is  calculated  with  the 
formula 


where 

F  =  number  of  first-order  factorial  points 
F0  =  number  of  first-order  center  runs 
k  —  number  of  factors 
a0  =  number  of  center  runs  in  axial  block 

When  zero  center  point  runs  are  included  in  the  axial  block,  the  axial  distance  achieves 
both  rotatability  and  orthogonal  blocking.  Therefore,  the  axial  distance  is  set  to  a  —  2.0  . 

The  second-order  model  includes  main  effects,  quadratic  terms,  and  two-factor 
interaction  terms.  This  model  is  calculated  by  the  formula 
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In  v  =  p0  +  pir1  +  p2r2  +  p3z±  +  /?4z2  +  /?5z3 

+  PllTl  +  @22r2  +  P  33Z1  +  /?44Z2  +  PsSZ3 

(17) 

+/?12rlr2  +  Pl3rlZl  +  Pl4rlZ2  +  Pl5rlZ3  +  /^23r2Zl 
+/?24r2Z2  +  /^25rlZ3  +  /?34Z1Z2  +  p35ZlZ3  +  p45Z2Z3  +  6 

where 

v  =  predicted  initial  growth  rate 

Pi  —  main  effect  of  factor  i 

Pa  —  quadratic  effect  of  factor  i 

Pij  =  interaction  effect  of  factor  i  with  factor  j 

ri>  r2  —  ratio  variables 

z1,z2,z3  —  process  variables 

e  =  random  error  component 

2.4.6  Canonical  Analysis 

The  canonical  analysis  uses  the  coefficients  of  the  second-order  model  to 
determine  the  location  of  stationary  point  within  the  design  region.  The  stationary  point 
can  represent  a  minimum,  maximum,  saddle  point,  or  ridge  system  depending  on  these 
coefficients.  First,  to  calculate  the  stationary  point,  the  B  matrix  is  assembled  with 
second-order  and  interaction  coefficients: 


Pn  P12/2 
P22 

sym. 


Pis/2~ 

P25/2 

Pss  - 


(18) 
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A  vector  of  main  effects  b  —  [flu  /?2,  /?3,  /?4,  /?s J  and  B  are  used  to  calculate  the 
stationary  point  xs  (Myers  et  al.,  2009:223).  The  stationary  point  is  calculated  with  the 
formula 

xs  = - B~xb  (19) 

2 

This  stationary  point  is  in  coded  units  referenced  from  the  center  of  the  second-order 
design.  The  conversion  to  engineering  units  is  applied  before  reporting  the  results  to  the 
user.  The  predicted  response  at  the  stationary  point  is  calculated  by  the  formula  (Myers 
et  al.,  2009:224) 

9s  =  b0  +  ^x'sb  (20) 
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III.  Methodology 


This  chapter  explains  the  computer  program  that  autonomously  plans 
experiments  to  optimize  carbon  nanotube  growth.  The  first  section  overviews  the 
Response  Surface  Methodology  (RSM)  process.  The  succeeding  sections  discuss  the 
algorithms  within  the  sub-processes  of  system.  The  second  section  explains  the  start  of 
the  RSM  process  and  first-order  design  generation  flow.  The  third  section  explains  the 
lack  of  fit  test,  the  linear  regression  model,  and  the  search  process  flow.  The  fourth 
section  explains  the  second-order  design  generation,  the  second-order  regression  model, 
and  the  canonical  analysis  flow.  The  fifth  section  explains  the  analysis  of  solutions  from 
the  model.  The  last  section  discusses  the  challenges  of  applying  RSM  procedures  to 
create  experimental  autonomy  within  the  Adaptive  Rapid  Experimentation  and 
Spectroscopy  (ARES)  system. 

3.1  RSM  Model  Overview 

The  RSM  process  begins  with  manual  input  of  the  experimental  boundary,  the 

factor  level  sizes,  and  the  initial  start  location.  The  planner  creates  a  randomized  first- 

order  design  with  the  information  from  the  user’s  inputs.  Center  point  experiments  are 

included  in  the  first-order  design  to  test  for  lack  of  fit.  The  lack  of  fit  test  determines  the 

significance  of  quadratic  curvature  in  the  current  response  surface.  If  the  surface  appears 

linear,  the  planner  calculates  a  first-order  regression  model  to  find  the  gradient  used  to 

search  outside  of  the  region.  The  search  progresses  until  the  response  variable  stops 

increasing  in  value.  The  planner  executes  a  series  of  first-order  designs  and  linear 

searches  until  the  surface  appears  non-linear.  When  the  lack  of  fit  test  detects  a 

significant  curvature  in  the  response  surface,  additional  axial  runs  are  augmented  to  the 
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first-order  design  to  complete  a  second-order  design.  The  planner  calculates  a  second- 
order  regression  model  from  the  augmented  design  and  performs  canonical  analysis.  The 
planner  displays  the  optimal  response  value  and  the  corresponding  factor  level  settings 
to  the  user.  Users  can  save  the  solution  and  compare  with  previous  results  to  improve 
future  RSM  processes.  Figure  3.1  shows  the  flowchart  of  the  RSM  overview. 


Figure  3.1.  RSM  Overview  Flowchart 

The  planner  updates  a  status  indicator  to  advance  to  subsequent  stages  of  the 
RSM  process  after  experiments  are  planned  for  ARES.  ARES  does  not  interact  with  the 
RSM  planner  during  the  experiment  process.  Every  time  the  user  accesses  the  planner 
from  ARES,  the  status  indicator  is  retrieved.  The  status  dictates  what  actions  the  planner 
should  execute  to  successfully  advance  to  the  next  stage  or  event.  These  actions 
typically  involve  gathering  response  data,  performing  necessary  calculations,  and 
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displaying  the  appropriate  Graphical  User  Interface  (GUI).  Whether  the  user  decides  to 
perform  experiments  or  cancel  the  planner,  the  status  updates  to  the  appropriate  status 
before  the  planner  exits.  The  Sections  3.2  through  3.5  explain  each  possible  status 
indicator  and  the  actions  of  the  program  based  on  each  status.  Appendix  A  includes 
a  list  of  all  possible  status  indicators.  Appendix  B  includes  a  list  of  all  available  data 
files  used  in  the  input/output  system. 

3.2  Initial  Start  and  First  Order  Design  Flow 

The  RSM  initialization  occurs  when  the  status  equals  “Start”.  The  flowchart  in 
Figure  3.2  displays  the  actions  of  the  planner  when  this  status  is  obtained.  The  “Start” 
status  can  originate  from  three  different  possibilities:  no  previous  experimentation, 
completion  of  a  previous  RSM  process,  or  from  a  manual  decision  to  restart  the  process. 
In  two  of  these  possibilities,  some  of  the  stored  data  files  contain  superseded 
information.  Therefore,  the  first  step  is  to  clear  all  data  files  except  for  the  file  that  stores 
previous  RSM  results. 
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Figure  3.2.  Initial  Start  and  First-Order  Design  Flowchart 


The  start  menu  GUI,  shown  in  Figure  3.3,  opens  immediately  after  the  data  files 
are  cleared.  The  first  button  opens  a  spreadsheet  that  displays  the  results  from  previous 
RSM  processes.  Analysis  of  previous  results  can  assist  in  determining  future  input 
settings.  The  lower  section  of  the  start  menu  contains  the  buttons  that  allow  the  user  to 
alter  the  input  settings.  The  red  notice  informs  the  user  that  at  least  ten  experiments  must 
be  available  to  advance  to  the  first-order  design  phase.  If  less  than  ten  experiments  are 
available,  a  warning  message  appears  after  the  user  confirms  to  experiment  the  first- 
order  design.  In  that  situation,  the  program  will  not  plan  any  experiments  and  there  is  no 
change  to  the  status. 
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“9  RSM  Planner  -  Start  Menu  _  |  l~l  |  X  | 

Welcome  to  the  Response  Surface 
Methodology  (RSM)  Planner 

This  program  will  plan  a  series  of  experiments 
for  ARES  that  follows  a  RSM  approach  to 
maximize  the  initial  growth  rate.  When  the 
program  is  able  to  find  an  optimum,  the  optimal 
response  and  factor  levels  settings  will  be 
reported. 

View  Previous  Results 

Start  a  New  RSM  Procedure 
Note:  At  least  10  usable  pillars  are  required  to  start 

Inputs 

Factor  Level  Ranges  Factor  Level  Size 


Where  to  Start  the  Search 
®  Random  Location 
Q  Specify  Location 


Cancel  Get  Started 


Figure  3.3.  Initial  Start  Menu  GUI 


The  first  input  button  opens  another  menu  to  adjust  the  factor  level  ranges.  The 
range  adjustment  menu  is  shown  in  Figure  3.4.  This  menu  loads  default  values  for  the 
minimum  and  maximum  ranges  provided  by  the  researchers  prior  to  coding  the  program. 
There  are  two  common  reasons  to  adjust  the  factor  level  ranges.  First,  the  experimental 
region  may  expand  if  the  capabilities  of  the  ARES  system  increase  in  the  future.  Second, 
the  user  may  become  disinterested  in  experimenting  in  certain  areas  of  design  space.  The 
input  boxes  of  the  ranges  allow  the  user  to  type  in  any  text  value.  These  values  must  be 
feasible  settings  that  ARES  can  execute.  The  program  will  return  an  error  message  if  a 
maximum  range  is  less  than  a  minimum  range. 
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SB  RSM  Planner  -  Adjust  Factor  Level  Settings 


°i  *i 

Adjust  Factor  Level  Ranges 

The  factor  level  ranges  represent  the  boundary  of 
the  entire  experimental  region.  Each  individual 
setting  must  be  feasible  option  in  ARES.  The 
boundaries  can  be  adjust  if  the  experimenter's 
location  of  interest  changes. 

Factors  Min  Max 

Argon  (%) 

Ethylene  (%) 

Hydrogen  (%) 

Pressure  (Torr) 

Temperature  (Celsius) 

Water  Concentration  (ppm) 

Cancel  Continue  With  Changes 


Figure  3.4.  Factor  Level  Range  Adjustment  Menu  GUI 

The  next  input  option,  shown  in  Figure  3.5,  is  the  factor  level  size  adjustment 
menu.  This  menu  also  loads  default  values  mostly  based  on  the  suspected  noise  of  each 
factor  provided  by  the  researchers.  Unlike  the  factor  level  range  menu,  the  size 
adjustment  menu  displays  ratio  variables  instead  of  mixture  percentages  since  the 
mixture  three  levels  cannot  change  independently.  This  menu  contains  numeric 
textboxes  that  allow  the  user  to  incrementally  change  the  level  size  values  by  clicking  on 
the  corresponding  arrow.  The  benefit  of  numeric  textboxes  is  that  only  numeric  values 
within  a  predetermined  minimum  and  maximum  range  can  be  entered.  After  several 
iterations  of  the  RSM  process,  the  user  should  consider  rescaling  the  factor  level  sizes. 
Level  sizes  should  be  large  enough  to  overcome  noise,  but  not  so  large  that  the  effect  of 
a  single  factor  unintentionally  dominates  the  main  effects  regression  model 


(Montgomery,  2009:256). 
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Figure  3.5.  Factor  Level  Size  Adjustment  Menu  GUI 


The  last  user  input  is  the  initial  search  location  of  the  RSM  model.  The  two 
options  available  are  a  random  or  user-specified  location.  The  random  location  is 
generated  within  the  inner  sixty  percent  of  the  design  region  to  avoid  immediately 
searching  near  a  boundary.  The  start  location  menu,  shown  in  Figure  3.6,  uses  numeric 
textboxes  to  input  the  desired  start  location.  The  range  of  the  numeric  textboxes  adjusts 
based  on  the  information  from  the  factor  boundaries  and  level  size.  This  adjustment 
ensures  that  the  initial  first-order  design  is  within  the  region  of  operability.  The  selection 
of  the  initial  location  is  solely  a  user  preference,  but  previous  research  knowledge  and 
RSM  results  should  factor  into  the  decision. 
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Figure  3.6.  User-Specified  Start  Location  Menu  GUI 

The  “Get  Started”  button  is  pressed  once  the  user  is  ready  to  advance  from  the 
start  menu.  This  button  has  green  text  to  signify  that  this  action  will  continue  the  RSM 
process  as  recommended  by  the  programmer.  Buttons  with  black  text  are  optional 
actions  that  may  alter  the  original  model  plan.  Buttons  with  red  text  are  inadvisable 
actions  or  invoke  restarting  the  entire  process.  This  coloring  scheme  is  intended  to  help 
novice  users  navigate  quickly  through  the  GUI  of  each  process. 

After  the  start  menu,  the  number  of  available  experiments  determines  the  next 
course  of  action.  If  the  number  of  available  runs  is  less  than  20,  the  first-order  design  is 
executed  in  two  separate  orthogonal  blocks.  Orthogonal  blocking  limits  the  nuisance 
effects  created  by  performing  sets  of  experiments  on  different  patches  or  at  a  much 
different  time.  When  blocking,  the  first-order  block  1  menu  appears  and  informs  the  user 
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that  a  block  requires  ten  runs,  as  shown  in  Figure  3.7.  This  menu  displays  the  three 
possible  factor  levels,  so  the  user  is  aware  of  the  settings  tested  in  this  design.  If  the  user 
selects  to  continue,  the  coded  first  block  design  is  saved  to  a  data  file  and  is  also  written 
to  the  planner  file  in  engineering  units.  The  status  is  updated  to  “FO  Block  1”  to 
represent  that  only  the  first  block  is  complete.  The  full  first-order  design  menu  is 
displayed  when  20  or  more  runs  available.  This  menu  is  similar  to  the  first  block  menu, 
but  reflects  the  full  design.  The  same  actions  described  in  the  first  block  are  performed 
on  the  full  design,  but  the  status  is  changed  to  “FO  Full”  to  represent  that  all  first-order 
runs  are  complete. 


Figure  3.7.  First-order  Block  1  Design  Menu  GUI 

After  completing  the  first  block  of  the  first-order  design,  the  “FO  Block  1”  status 
is  captured  upon  the  next  planner  access.  The  planner  obtains  the  response  values  from 
the  first  block  and  saves  these  results  to  the  first-order  response  data  file.  The  second 
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block  menu  displays  to  show  the  same  information  as  the  other  two  first-order  menus.  If 
tens  runs  are  available,  the  coded  design  of  the  second  block  is  appended  to  the  first 
block  design  in  the  data  file.  The  second  block  is  written  to  the  planner  file  in 
engineering  imits.  The  status  updates  to  “FO  Block  2”  to  represent  that  the  second  block 
of  the  first-order  design  is  complete.  The  flowchart  that  depicts  the  planner’s  actions  for 
a  “FO  Block  2”  status  is  shown  in  Figure  3.8. 


Figure  3.8.  Fhst-order  Design  Block  2  Flowchart 

3.3  Lack  of  Fit  Test  and  Post-Results  Flow 

The  lack  of  fit  test  and  post-results  flowchart  is  shown  in  Figure  3.9.  When  status 
is  “FO  Full”  or  “FO  Block  2”,  first-order  design  experimentation  is  finished.  If  the  status 
is  “FO  Block  2”,  the  second  block  response  data  is  appended  to  fust  block  results.  If  the 
status  is  “FO  Full”,  all  of  the  response  data  is  obtained  and  saved.  The  lack  of  fit  test  is 
performed  using  the  coded  first-order  design  and  the  response  data.  This  test  returns  a  p- 
value  which  is  the  deciding  factor  for  the  next  course  of  action. 
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Figure  3.9.  Lack  of  Fit  Test  and  Post-Results  Flowchart 

The  lack  of  fit  test  results  menu  displays  the  p-value  result  and  several  selectable 
actions,  as  shown  in  Figure  3.10.  If  the  p-value  is  less  than  or  equal  to  the  threshold  of 
0.05,  augmentation  to  a  second-order  design  is  recommended.  If  the  p-value  is  greater 
than  this  threshold,  the  search  process  is  recommended.  To  proceed  with  the 
recommendation,  the  user  should  select  the  “Experiment  with  Result”  button.  The  user 
also  has  the  option  to  not  proceed  with  the  recommendation  and  force  a  second-order 
design  or  search  process.  The  RSM  process  can  benefit  from  forcing  second-order 
design  augmentation  if  two  search  iterations  have  executed  and  the  maximum  response 
is  minimally  improving.  Forcing  the  search  process  requires  additional  experiments,  but 
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can  help  to  shift  the  experimentation  region  closer  to  the  optimal  solution.  The  decision 
to  force  either  option  primarily  involves  the  tradeoff  between  the  accuracy  of  the  final 
solution  estimate  and  the  amount  of  additional  experiments  the  user  is  willing  to 
perform. 


Lack  of  Fit  Test  Results  _  |  l~l  |  X 

Lack  of  Fit  Test  Results 

The  lack  of  fit  test  determines  if  the  current 
design  location  is  a  non-linear  surface.  If  the 
surface  is  non-linear,  a  second-order  model 
will  be  applied.  If  not,  the  first-order  search 
technique  will  be  applied.  The  null  hypothesis 
of  this  test  is  that  the  design  is  linear.  A  p-value 
below  the  threshold  rejects  the  null. 

P-value  From  LoF  Test  = 

P-value  Threshold  = 

The  p-value  is  greater  than  the  threshold,  so 
the  first-order  search  process  will  begin. 

Cancel  Experiment  with  Result 

Optional  Manual  Decision 

Number  of  First  Order  Designs  =  _ 

If  the  number  of  first-order  designs  is  above  3 
and  the  maximum  response  has  been 
changing  very  slightly,  a  second-order  design 
can  be  forced. 

Force  Search  Force  Second-Order 


Figure  3.10.  Lack  of  Fit  Test  Results  Menu  GUI 


To  begin  the  search  process,  a  first-order  regression  model  is  required  to 
compute  the  path  of  steepest  ascent.  The  model  coefficients  are  calculated  by  matrix 
multiplication  involving  the  coded  first-order  design  and  the  response  values.  The 
coefficients  and  intercept  of  the  first-order  model  are  displayed  to  the  user  in  the  linear 
regression  model  menu.  The  regression  menu  GUI  is  shown  in  Figure  3.11.  Due  to  the 
log  transformation,  the  results  are  difficult  to  interpret.  However,  the  sign  and  magnitude 
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of  each  coefficient  still  provide  great  insight  into  the  direction  of  the  path  of  ascent. 
Following  the  regression  model  menu,  the  model  coefficients  are  the  main  input  of  the 
search  process.  Section  3.4  discusses  the  search  process  flow. 


Figure  3.11.  Linear  Regression  Model  GUI 


The  second-order  design  augmentation  process  begins  with  displaying  the 
second-order  design  menu.  This  menu  is  similar  to  the  first-order  design  menus,  but 
adjusts  the  low  and  high  levels  to  reflect  the  axial  distance,  as  shown  in  Figure  3.12.  The 
second-order  design  augmentation  involves  axial  runs  and  additional  center  point  runs,  if 
applicable.  These  augmented  experiments  are  first  generated  in  the  coded  form, 
randomized,  and  then  saved  to  a  data  file.  The  runs  are  converted  to  engineering  units 
and  then  written  to  the  planner  file.  The  status  is  updated  to  “Second  Order”  to  represent 
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that  the  second-order  design  is  complete.  Section  3.5  discusses  the  analysis  of  the 
second-order  design  results. 


Figure  3.12.  Second-order  Design  Menu  GUI 


3.4  Search  Process  Flow 

The  search  process  consists  of  three  main  phases:  initialization,  analysis,  and 
continuation.  Initialization  of  the  search  process  involves  establishing  the  path  of 
steepest  ascent  and  saving  the  important  search  information  to  data  files  for  later  use. 

The  analysis  of  the  search  process  determines  if  a  maximum  value  is  obtainable  from  the 
search  results.  If  no  maximum  value  is  found,  then  the  search  process  continues.  When  a 
maximum  value  is  found,  a  new  first-order  design  is  centered  at  this  location  and 
advances  the  RSM  process.  The  search  process  initialization  flowchart  is  shown  in 
Figure  3.13. 
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Figure  3.13.  Search  Process  Initialization  Menu  GUI 


3.4.1  Search  Process  Initialization 

The  search  process  initialization  begins  with  the  calculation  of  the  path  of 
steepest  ascent  using  the  unit  gradient  approach.  This  path  is  then  stored  in  a  data  file,  so 
the  same  path  can  be  reused  if  the  maximum  value  is  not  yet  obtainable.  The  factor  level 
boundaries  are  calculated  in  coded  form  using  the  current  search  location  and  the  level 
sizes.  This  coded  boundary  range  prevents  the  search  from  planning  experiments  outside 
of  the  region  of  operability.  The  planner  will  generate  no  more  than  ten  experiments 
during  the  search  phase  to  limit  the  amount  of  unnecessary  experiments.  After  an 
experiment  is  generated  in  coded  units,  it  is  compared  to  the  coded  boundary  range.  If 
any  factor  level  exists  outside  of  this  range,  that  level  is  set  equal  to  the  coded  boundary. 
Experimentation  continues  on  the  boundary  for  the  remainder  of  planned  experiments. 
Section  3.6.3  discusses  this  boundary  technique  in  greater  detail.  The  coded  search 


42 


experiments  are  saved  to  a  data  file  and  then  converted  to  engineering  units  for  the 
planner  file.  The  status  is  updated  to  “Search”  to  represent  that  the  RSM  model  is  in  the 
search  process  phase. 

3.4.2  Search  Process  Analysis  Phase 

This  subsection  discusses  the  search  analysis  phase.  Traditionally,  each  search 
process  run  is  analyzed  individually  against  a  stopping  criterion.  Due  to  the  lack  of 
communication  between  ARES  and  the  planner,  streams  of  experiments  are  analyzed 
until  a  stopping  point  is  established.  The  flowchart  in  Figure  3.14  displays  the  main 
algorithm  for  the  analysis  phase.  The  first  step  is  to  obtain  the  search  response  data  and 
then  append  the  data  to  the  previous  results  from  that  search,  if  applicable.  It  is 
important  to  analyze  all  of  the  search  results  together,  because  data  essential  to  the 
stopping  condition  can  exist  on  separate  streams  of  experiments.  A  loop  is  applied  to 
analyze  each  experiment  result.  The  counter  for  this  loop  is  a  step  variable  which 
increases  by  one  after  each  experiment  is  analyzed.  The  maximum  response  value  and 
the  step  at  which  it  occurs  are  saved  in  case  the  stopping  condition  is  achieved.  The 
stopping  condition  is  achieved  when  the  response  value  has  decreased  in  two 
consecutive  steps  (Myers  et  al.,  2009:182).  This  stopping  criterion  is  a  robust  enough  to 
not  trigger  for  extreme  outliers  in  the  positive  or  negative  direction. 
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Figure  3.14.  Search  Process  Analysis  Phase  Flowchart 

The  analysis  loop  ends  if  there  are  no  more  experiments  to  assess  or  the  stopping 
condition  is  achieved.  If  the  stopping  condition  is  achieved,  the  search  location  with  the 
maximum  response  becomes  the  center  of  a  new  first-order  design.  It  is  possible  that  the 
maximum  response  is  located  many  experiments  before  the  stopping  condition.  These 
maximum  responses  appear  as  outliers  in  the  positive  direction,  but  should  not  be 
overlooked.  Narrow  peaks  in  the  response  surface  are  hard  to  identify  with  large  factor 
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level  sizes.  The  location  of  the  maximum  response  is  updated  in  data  file  as  the  current 
search  location.  A  menu  displays  next  and  informs  the  user  of  the  current  maximum 
response  and  location,  as  shown  in  Figure  3.15.  The  new  first-order  design  is  generated 
around  this  location  using  the  same  factor  level  sizes  from  the  initial  start  menu.  The 
first-order  design  procedure  is  identical  to  the  description  from  Section  3.2.  The  status 
indicator  updates  to  the  appropriate  first-order  status  depending  on  the  amount  of 
available  experiments. 


Figure  3.15.  Search  Process  Stopped  Menu  GUI 


3.4.3  Search  Process  Continuation  Phase 

The  search  process  continues  if  two  consecutive  decreasing  values  are  not  found 
in  the  search  response  results.  The  flowchart  for  the  continuation  phase  is  shown  in 
Figure  3.16.  The  first  step  is  to  display  a  menu  to  inform  the  user  of  the  continuation  of 


the  search  process,  as  shown  in  Figure  3.17.  This  menu  displays  the  current  maximum 
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response  and  the  factor  level  increments  of  the  search.  The  experiments  in  this  phase  are 


generated  in  the  same  manner  as  the  initialization  phase.  The  status  indicator  remains  at 


“Search”  to  represent  that  the  search  process  is  still  active. 


Figure  3.16.  Search  Process  Continuation  Phase  Flowchart 


Figure  3.17.  Search  Continuation  Menu  GUI 
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3.5  Solution  Determination  and  User  Report 


After  the  axial  block  is  experimented,  the  remaining  steps  of  the  RSM  process 
focus  on  obtaining  an  optimal  solution.  The  flowchart  for  second-order  model 
generation,  canonical  analysis,  and  the  solution  report  is  displayed  in  Figure  3.18.  The 
first  step  is  to  obtain  the  augmentation  response  results  from  the  data  file.  These  results 
and  the  augmentation  coded  design  are  appended  to  the  previous  first-order  response 
results  and  coded  design,  respectively.  The  combination  of  these  two  design  portions 
completes  the  augmentation  of  the  Central  Composite  Design  (CCD).  To  generate  a 
second-order  model,  the  independent  variable  matrix  must  also  include  interaction  and 
quadratic  terms.  These  additional  terms  are  calculated  by  multiplication  of  the  proper 
two  main  effect  columns  in  the  matrix.  After  this  calculation,  the  second-order  model 
coefficients  are  generated  and  then  assigned  to  the  appropriate  canonical  analysis  matrix 
or  vector. 
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Figure  3.18.  Canonical  Analysis  and  Solution  Flowchart 


The  coded  stationary  point  xs  is  obtained  through  canonical  analysis  of  the 

second-order  model.  For  this  point  to  be  considered  the  final  solution,  it  must  fall  within 

the  second-order  experimentation  region  and  appear  to  represent  a  local  maximum.  It  is 

possible  that  the  estimated  stationary  point  is  located  inside  the  design  region,  but  is  still 

in  the  canonical  form  of  a  minimum  or  saddle  point.  A  minimum  or  saddle  point 

response  value  is  likely  of  no  use  to  the  user.  Classic  analysis  uses  eigenvalue  analysis  to 

classify  the  response  surface.  For  this  tool,  the  value  of  the  predicted  response  is 

compared  to  the  second-order  design  response  data.  If  the  predicted  response  is  less  than 
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the  average  of  the  second-order  design  response  data,  the  canonical  form  is  likely  not  a 
maximum.  If  the  canonical  analysis  appears  to  not  provide  a  maximum  solution,  the 
experiment  from  the  CCD  with  the  largest  response  value  is  considered  the  alternative 
optimal  solution. 

A  menu  displays  to  report  the  appropriate  solution  to  the  user,  as  shown  in 
Figure  3.19.  The  user  is  informed  whether  the  result  is  obtained  through  canonical 
analysis  or  from  the  CCD  results.  The  predicted  optimal  response  and  the  corresponding 
factor  level  settings  are  displayed.  If  the  user  is  satisfied  with  the  results,  there  is  an 
option  to  save  the  results  to  a  data  file. 


Figure  3.19.  Predicted  Solution  Report  GUI 
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3.6  Additional  Challenges  of  Autonomy  Incorporation 


3.6.1  Experiment  Success  and  Data  Capture  Rate 

Carbon  nanotube  experimentation  is  certainly  an  arduous  process.  Numerous 
nuisances  can  influence  the  success  of  each  experiment.  These  nuisances  can  hinder  the 
success  of  an  experiment  with  complete  independence  from  the  actual  factor  level 
settings  of  that  run.  On  a  regular  basis,  a  set  of  experiments  contains  at  least  one  run  that 
did  not  have  successful  growth  due  to  extraneous  influences  on  ARES.  These 
unsuccessful  runs  are  usually  unrelated  to  planned  factor  settings.  All  stages  of  the  RSM 
process  depend  on  reliable  response  data,  so  it  critical  to  obtain  as  many  successful  runs 
as  possible.  Eventually,  it  is  more  beneficial  to  advance  through  the  RSM  process  with 
some  unsuccessful  results  than  continually  retest  the  same  design  until  perfect  results  are 
obtained. 

The  first  and  last  run  of  each  experiment  set  can  produce  unreliable  or  missing 
results.  The  researchers  do  not  currently  have  the  ability  to  perform  trial  experiments  on 
ARES.  Trial  experiments  can  help  a  system  “warm-up”  and  reach  a  steady-state 
performance  before  the  designed  experiments  begin.  The  lack  of  any  trial  runs  or 
previous  experimentation  on  ARES  leads  to  a  high  rate  of  unsuccessful  growth  on  the 
first  designed  experiment  run.  Also,  ARES  is  currently  unable  to  capture  growth  data  on 
the  last  run  of  each  designed  experiment  set,  although  the  researchers  can  occasionally 
approximate  an  initial  growth  rate  of  the  last  experiment.  Overall,  the  issues  with  the 
first  and  last  experiment  can  affect  up  to  20  percent  of  the  first-order  design  runs.  Yet 
due  to  the  randomization  and  the  design  resolution,  the  main  effect  estimates  should  not 


alter  significantly. 
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It  is  also  possible  that  most  experiments  in  a  set  are  unsuccessful.  This  can  occur 
from  a  number  of  nuisances  including  faulty  temperature  calibrations  or  a  program 
memory  leak.  When  a  large  percentage  is  unsuccessful,  an  entire  retest  is  expected. 
Experimenting  in  partial  sets  is  not  recommended  due  to  the  importance  of  blocking.  It 
is  highly  preferred  that  each  block  is  tested  on  the  same  patch  of  silicon  pillars.  When 
retesting  is  required,  adjusting  the  status  indicator  and  the  appropriate  data  files  is  a 
difficult  for  new  system  user.  A  methodology  must  exist  to  virtually  recreate  the  RSM 
process  to  its  previous  state  before  the  set  of  unsuccessful  experiments.  It  is  critical  that 
the  planner  does  not  delete  and  can  recover  all  of  the  necessary  information  to  move  a 
step  backwards  in  the  process. 

3.6.2  Model  Adequacy  and  Outlier  Analysis 

In  traditional  RSM  practice,  the  analyst  can  validate  the  assumptions  of  each 
linear  regression  model  during  the  creation.  The  most  critical  adequacy  checks  involve 
visually  assessing  the  model’s  residuals.  These  visual  checks  are  not  plausible  in  an 
autonomous  RSM  system.  Historical  data,  if  it  is  available,  can  be  analyzed  to  foresee  if 
model  adequacy  is  an  issue.  Data  transformations  on  the  response  and  regressor 
variables  can  ensure  normality  and  constant  variance  of  the  model’s  residuals.  In  this 
study,  a  log  transformation  is  applied  to  the  response  variable  to  correct  a  funnel-shaped 
trend  identified  in  a  constant  variance  plot.  The  log  transformation  was  verified  on  the 
prior  data  and  can  also  be  further  verified  using  experiments  designed  from  the  RSM 
planner.  If  residual  analysis  continually  demonstrates  that  a  log  transformation  is 
necessary,  then  regression  model  assumptions  are  likely  satisfied  in  future  RSM 
processes. 
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The  planner  does  not  perform  any  internal  outlier  analysis.  The  carbon 
researchers  perform  their  own  analysis  on  outliers.  The  analysis  primarily  involves  the 
determination  of  whether  the  extreme  value  was  caused  by  an  error  in  the  ARES  system. 
If  response  data  is  deemed  invalid  by  the  researcher,  it  should  be  adjusted  before  the 
planner  obtains  the  value.  Statistical  outliers  may  still  exist  in  each  linear  model,  but 
should  not  significantly  affect  coefficient  estimates  due  to  the  robustness  of  factorial 
experimental  designs  to  outliers  (Montgomery,  2009:268).  Moreover,  an  outlier  in  the 
extreme  positive  direction  could  represent  a  small  region  where  a  local  maximum  exists. 
If  outliers  or  other  nuisance  factors  are  continually  problematic,  experiment  replication 
in  any  stage  of  the  RSM  process  is  recommended. 

3.6.3  Mathematical  and  Functional  Techniques  in  C# 

The  C#  language  is  limited  its  mathematical  functions  and  capabilities.  Several 
open-source  software  packages  exist  that  provide  some  advanced  mathematical 
techniques,  but  none  of  these  packages  are  included  in  the  planner  for  several  reasons. 
The  packages  lack  a  great  amount  of  documentation  and  instructions.  Second,  the 
packages  use  different  object  classes.  These  object  classes  are  not  compatible  with  the 
mathematical  functions  and  techniques  already  developed  in  the  planner  code.  Lastly,  it 
is  dangerous  to  incorporate  open-source  software  into  computers  on  government 
networks  or  even  on  stand-alone  units.  Many  technical  functions  were  developed  for  C# 
during  the  research  time  period,  but  there  are  several  techniques  that  required 
workarounds  due  to  the  lack  of  capability. 
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Simplified  Boundary  Technique. 


During  the  path  of  steepest  ascent,  the  search  continues  along  the  determined 
path  until  it  reaches  a  contact  point  on  a  factor  level  boundary.  The  contact  point 
typically  occurs  in  between  full  runs  in  the  search  process.  For  example,  the  contact 
point  could  occur  at  5.67  coded  steps  from  the  design  center  which  is  in  between 
planned  runs  five  and  six.  The  planner  increases  the  integer  step  counter  by  one  for  each 
experiment.  If  the  contact  point  experiment  is  tested,  the  levels  of  the  unconstrained 
factors  need  to  be  adjusted  to  reflect  a  fractional  run  and  the  step  counter  is  offset  by 
one.  Rather  than  add  this  increased  complexity,  the  contact  point  experiment  is  skipped 
and  the  search  resumes  at  the  next  full  run.  The  additional  effort  required  to  code  the 
contact  point  experiment  outweighs  the  benefit  of  including  this  experiment  in  the 
search  process.  The  appropriate  constrained  path  is  still  followed  along  the  boundary. 

The  search  continues  on  the  boundary  until  the  response  stops  increasing.  When 
the  planner  assesses  that  the  boundary  is  reached  and  a  maximum  response  is  found,  the 
RSM  process  is  considered  complete.  Normally,  a  first-order  design  is  centered  at  the 
stopping  point,  but  this  is  not  possible  on  the  boundary.  Due  to  the  unlikelihood  that  a 
new  design  searches  away  from  the  boundary,  the  maximum  search  location  is 
considered  the  final  solution.  The  search  boundary  location  menu  displays  this  final 
solution  to  the  user,  as  shown  in  Figure  3.20.  The  menu  provides  a  recommendation  to 
restart  the  process  at  a  different  initial  start  to  hopefully  find  a  true  optimal  solution 
within  the  region  of  operability. 
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RB  Search  Boundary  Reached 


During  the  search  process,  at  least  one  of  the 
variables  reached  the  feasible  boundary. 

Use  a  drfferet  start  location  to  possibly  find  a  true 
optimal  solution  within  the  boundary.  If  changing  the 
initial  settings  does  not  work,  the  factors  on  the 
boundary  should  be  set  to  a  stationary  point 
somewhere  inside  the  boundary. 

Best  Result  Found  on  the  Boundary 

Maximum  Nu  = 

Argon  Mixture  (%)  = 

Ethylene  Mixture  (%)  = 

Hydrogen  Mixture  (%)  = 

Total  Pressure  (Torr)  = 

Temperature  (Celsius)  = 

Water  Concentration  (ppm)  = 

Just  Close  Save  and  Close 


Figure  3.20.  Search  Boundary  Solution  Menu  GUI 
Canonical  Analysis  Without  Eigenvalue  Capability 

To  determine  the  true  nature  of  the  response  surface,  one  could  analyze  the 

eigenvalues  of  the  B  matrix.  With  these  eigenvalues,  there  is  a  greater  possibility  to 

correctly  label  the  response  surface  as  a  minimum,  maximum,  saddle  point,  or  a  ridge 

system.  However,  algorithms  for  calculating  all  five  eigenvalues  of  the  B  matrix  are 

quite  difficult  in  the  C#  environment.  It  is  still  possible  to  gain  insight  into  the  canonical 

form  even  without  the  eigenvalue  analysis.  The  canonical  form  is  likely  a  minimum  or  a 

saddle  point  if  the  optimal  response  value  is  lower  than  the  average  responses  from  the 

CCD.  A  ridge  system  is  likely  if  the  optimal  setting  is  well  outside  of  the  CCD  region  of 

experimentation.  Through  a  process  of  elimination,  the  canonical  form  is  predicted  as  a 
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maximum.  This  alternate  procedure  is  followed  after  the  optimal  coded  settings  are 
determined.  The  sign  and  magnitude  of  the  second-order  coefficients  can  occasionally 
provide  insight  into  the  canonical  form,  but  this  is  an  unreliable  technique  when 
interaction  terms  are  significant. 

Verification  of  Planner  with  Test  Distribution 

During  the  design  phase,  the  planner  was  coded  on  a  personal  computer  separate 
from  the  ARES  system.  The  RSM  requires  response  data  to  advance  through  the 
process,  but  true  experimentation  can  only  occur  after  implementation  on  ARES.  To 
provide  simulated  results  for  the  planner,  a  multivariate  normal  distribution  was  applied 
to  represent  ARES  experimentation.  Simulated  results  are  crucial  for  the  verification  of 
the  planner’s  algorithms  and  analytical  techniques.  The  multivariate  normal  distribution 
contains  only  one  peak,  so  the  results  are  fairly  easy  to  interpret.  The  standard  deviation 
of  each  variable  was  large  enough  to  ensure  the  planner  never  starts  in  an  area  with  an 
extremely  flat  surface.  A  normally  distributed  random  error  is  applied  to  each  response 
result.  It  is  helpful  to  start  with  a  low  degree  of  random  error  and  then  gradually  increase 
the  error  after  further  verification.  With  a  large  number  of  variables,  the  multivariate 
normal  returns  very  small  probabilities.  The  probabilities  were  scaled  to  larger  values  to 
improve  recognition  of  the  results  and  to  identify  an  appropriate  random  error  scale. 
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IV.  Results  and  Analysis 


This  chapter  presents  the  results  from  executing  the  Response  Surface 
Methodology  (RSM)  experiment  planner  on  the  Adaptive  Rapid  Experimentation  and 
Spectroscopy  (ARES)  system.  These  results  are  collected  to  validate  the  RSM 
techniques  and  verify  the  planner’s  algorithms  on  the  true  system.  The  operation  of  the 
planner  was  performed  under  supervision,  so  full  autonomy  was  not  applied  for  the 
collection  of  this  data.  The  operation  was  supervised,  because  the  planner  was  still  in 
early  stages  of  development  on  the  ARES  system.  The  supervision  did  lead  to  some 
significant  insights  regarding  the  performance  of  autonomy.  These  insights  are  discussed 
in  Chapter  5.  Not  enough  time  was  available  to  complete  the  full  RSM  process,  but 
significant  findings  are  identified  with  the  available  data.  This  chapter  is  divided  into 
three  sections  based  on  the  stages  of  the  RSM  process.  The  first  section  discusses  the 
initialization  and  first-order  design.  The  second  section  discusses  the  first-order  model 
analysis  and  the  search  process.  The  third  section  discusses  the  second  first-order  design 
and  the  proposed  search  process.  The  fourth  section  discusses  concerns  with  laser 
temperature  calibration. 

4.1  Initialization  and  First-Order  Design 

The  first  inputs  into  the  planner  are  the  number  of  available  experiments  and  the 
type  of  catalyst.  For  ease  of  implementation,  the  number  of  runs  was  hard-coded  at  ten 
for  each  access  of  the  planner.  Ten  runs  were  selected  as  the  hard-coded  value,  because 
each  set  of  experiments  throughout  the  entire  RSM  process  can  be  planned  at  ten  runs. 
For  this  RSM  process  iteration,  the  researchers  selected  an  iron-based  catalyst.  This 
catalyst  differs  from  other  catalysts  because  it  does  not  require  a  cooling  agent.  The 
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water  concentration  variable  is  not  of  interest  for  this  RSM  process  iteration.  The 
software  in  the  planner  and  ARES  is  designed  to  experiment  with  six  total  factors,  so  a 
planned  setting  for  water  concentration  is  required  in  the  planner  file.  The  level  size  is 
set  to  zero,  so  the  actual  water  concentration  is  planned  at  the  initial  location  value  of  3 
parts  per  million  (ppm)  and  never  changes  for  each  experiment.  Actually,  no  water  is 
applied  to  the  catalyst  and  the  3  ppm  merely  satisfies  the  software.  Any  effect  attributed 
to  water  in  the  first-order  model  is  just  due  to  noise.  Table  4.1  displays  the  initialization 
settings  used.  A  user-specified  initial  location  was  selected  with  the  mixture  gases 
initialized  at  an  almost  equal  blend. 


Table  4.1.  RSM  Process  Initialization  Settings 


Ratio  1 

(Ar  /  C2H4) 

Ratio  2 

(H2/C2H4) 

Pressure 

(Torr) 

Temp. 

(Celsius) 

Water 

(ppm) 

Initial  Location 

0.9706 

0.9706 

20 

725 

3 

Lower  Bound 

0.0526 

0 

1 

600 

0 

Upper  Bound 

9 

8.5 

40 

1,100 

80 

Level  Size 

0.1 

0.1 

3 

40 

0 

The  first-order  design  was  created  in  two  blocks  due  to  the  number  of  available 
experiments.  Each  block  was  executed  in  a  random  run  order.  The  coded  design  settings 
and  the  response  results  for  the  appended  first-order  design  are  displayed  in  Table  4.2. 
This  table  also  includes  a  column  for  notes  that  explains  the  success  of  each  experiment. 
The  first  experiment  of  the  first  block  did  not  show  significant  growth  which  is  expected 
when  no  trial  runs  are  performed.  The  17th  experiment  was  the  only  other  experiment  to 
not  exhibit  growth.  Both  results  are  valid  according  to  the  research  expert,  so  the  low 
initial  growth  rate  values  are  maintained.  The  last  experiment  likely  had  successful 
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growth,  but  the  result  data  was  not  saved  due  to  a  malfunction  within  ARES.  The 


median  response  of  the  other  19  experiments  is  used  as  a  replacement  value  for  the  last 
experiment.  While  better  techniques  exist  to  replace  a  missing  value  in  a  factorial 
design,  the  median  was  a  quick  solution  used  to  advance  the  RSM  process.  The  median 
was  used  to  not  significantly  affect  the  first-order  model  with  another  extreme  low 
value,  although  the  design  is  quite  robust  to  outliers.  The  quality  of  fit  in  the  notes  of 
Table  4.2  refers  to  how  well  the  growth  curve  fit  the  G-band  data  in  the  opinion  of  the 
research  expert. 


Table  4.2.  Coded  First-Order  Design  1  and  Results 


Run 

Ratio 

1 

Ratio 

2 

Press. 

Temp. 

Water 

Block 

Response 

(v) 

Notes 

1 

-1 

-1 

1 

-1 

-1 

1 

4.15 

No  growth 

2 

1 

1 

-1 

-1 

1 

1 

136.37 

Good  fit 

3 

1 

1 

1 

-1 

-1 

1 

85.97 

Good  fit 

4 

0 

0 

0 

0 

0 

1 

84.82 

Good  fit 

5 

0 

0 

0 

0 

0 

1 

72.16 

Good  fit 

6 

-1 

1 

-1 

1 

1 

1 

114.14 

Good  fit 

7 

1 

-1 

-1 

1 

1 

1 

31.81 

Good  fit 

8 

-1 

1 

1 

1 

-1 

1 

41.29 

Good  fit 

9 

-1 

-1 

-1 

-1 

1 

1 

50.94 

Good  fit 

10 

1 

-1 

1 

1 

-1 

1 

13.00 

Good  fit 

11 

-1 

1 

-1 

-1 

-1 

2 

654.45 

Good  fit 

12 

0 

0 

0 

0 

0 

2 

86.21 

Good  fit 

13 

-1 

1 

1 

-1 

1 

2 

68.67 

Good  fit 

14 

1 

1 

-1 

1 

-1 

2 

124.97 

Good  fit 

15 

-1 

-1 

-1 

1 

-1 

2 

62.21 

Good  fit 

16 

0 

0 

0 

0 

0 

2 

28.32 

Good  fit 

17 

1 

-1 

1 

-1 

1 

2 

0.11 

No  growth 

18 

1 

-1 

-1 

-1 

-1 

2 

38.10 

Okay  fit 

19 

-1 

-1 

1 

1 

1 

2 

91.73 

Good  fit 

20 

1 

1 

1 

1 

1 

2 

68.67 

Median 
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4.2  First  Linear  Regression  Model  and  Search 


The  first  verification  of  interest  is  the  log  transformation  of  the  response  in  the 
first-order  model.  The  main  effects  model  with  and  without  the  transformation  is 
generated  to  show  that  the  transformation  effectively  stabilized  variance.  Figure  4. 1 
displays  the  untransfoimed  model’s  residual  by  predicted  response  plot.  It  is  obvious  hi 
this  plot  that  the  variance  of  the  residuals  is  not  constant.  Figure  4.2  displays  the 
transformed  model’s  residual  by  predicted  response  plot.  In  this  second  plot,  the 
variance  appears  to  have  stabilized  considerably. 


Figure  4.1.  Residual  by  Predicted  Response  Plot  before  Transformation 
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Figure  4.2.  Residual  by  Predicted  Response  Plot  after  Transformation 


The  parameter  estimates  for  the  first-order  model  axe  generated  with  the  coded 
design  matrix  and  the  log  transformed  response  values.  Table  4.3  displays  the  parameter 
estimates  for  the  five  main  effects  in  the  model  and  the  intercept.  This  table  also  displays 
the  results  of  the  lack  of  fit  test  for  pure  quadratic  error  and  a  significant  two-factor 
interaction  for  Ratio  2  with  temperature.  The  design  is  orthogonal,  so  including  these 
additional  estimates  does  not  affect  the  main  effect  estimates,  but  the  p-values  are 
slightly  different.  The  t-test  ratios  and  the  p-values  show  which  effects  are  significant  in 
this  experimental  region. 
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Table  4.3.  Linear  Model  1  Analysis  and  Lack  of  Fit  Test 


Parameter 

Estimate 

t  Ratio 

Prob  >  It! 

Intercept 

3.9119 

11.65 

<.0001 

Ar  /  C2H4 

-0.4529 

-1.51 

0.1596 

h2  /  C2H4 

1.0060 

3.35 

0.0065 

Pressure 

-0.8219 

-2.74 

0.0193 

Temperature 

0.3299 

1.10 

0.2953 

Water  Concentration 

-0.2118 

-0.71 

0.4953 

Lack  of  Fit  Test 

-0.2181 

-0.65 

0.5292 

H2  /  C2H4*Temp. 

-0.6508 

-2.17 

0.0530 

Block 

0.0114 

0.04 

0.9669 

The  results  show  that  Ratio  2  and  total  pressure  are  the  most  significant  main 
effects.  Thus,  these  two  effects  are  the  strongest  contributors  to  the  main  direction  of  the 
path  of  steepest  ascent.  Ratio  1  and  temperature  are  not  significant  effects,  so  the  search 
path  will  not  greatly  change  for  either  variable.  The  lack  of  fit  test  returns  a  negative 
parameter  estimate  and  a  high  p-value,  so  there  is  virtually  no  indication  of  a  local 
maximum  in  this  region.  The  two-factor  interaction  is  interesting,  because  it  contradicts 
the  direction  of  the  main-effects  that  comprise  it.  Due  to  this  twisting  of  the  response 
surface  caused  by  interactions,  it  is  likely  that  more  than  one  search  iteration  is  needed 
to  optimize  all  factors.  Although  the  water  concentration  effect  is  only  modeling  noise, 
the  planner  still  incorporates  this  estimate  into  the  path  of  steepest  ascent  calculation. 
The  path’s  direction  for  the  other  factors  is  not  altered,  but  the  search  increments 
decrease  in  size.  Also,  examination  of  the  blocking  factor  for  this  design  revealed  that  it 
is  not  a  significant  effect. 

The  path  of  steepest  ascent  begins  at  the  center  of  the  first-order  design  which  is 


always  the  initial  search  location  for  the  first  path.  The  search  increment  is  calculated 
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using  the  main  effect  estimates  and  the  factor  level  sizes.  Table  4.4  displays  the  planned 


experimental  design  for  ten  runs  of  the  search  path.  The  response  results  and  notes  are 
displayed  on  the  right  side  of  this  table. 


Table  4.4.  Search  Process  Design  1  and  Results 


Ratio 

1 

Ratio 

2 

Press. 

(Torr) 

Temp. 

(Celsius) 

Water 

(ppm) 

Response 

(v) 

Notes 

Search  A 

-0.032 

0.070 

-1.72 

9.22 

0.00 

Base 

0.971 

0.971 

20 

725 

3 

Run  1 

0.939 

1.041 

18.28 

734.22 

3.00 

3.93 

No  growth 

Run  2 

0.907 

1.111 

16.55 

743.45 

3.00 

8.40 

Weak  fit 

Run  3 

0.876 

1.182 

14.83 

752.67 

3.00 

33.14 

Good  fit 

Run  4 

0.844 

1.252 

13.11 

761.90 

3.00 

25.06 

Good  fit 

Run  5 

0.812 

1.322 

11.38 

771.12 

3.00 

19.02 

Good  fit 

Run  6 

0.781 

1.393 

9.66 

780.35 

3.00 

18.51 

Good  fit 

Run  7 

0.749 

1.463 

7.93 

789.57 

3.00 

10.29 

Good  fit 

Run  8 

0.717 

1.533 

6.21 

798.80 

3.00 

5.29 

Good  fit 

Run  9 

0.686 

1.603 

4.49 

808.02 

3.00 

2.00 

No  growth 

Run  10 

0.654 

1.674 

2.76 

817.25 

3.00 

1.00 

No  data 

The  first  two  runs  did  not  have  fully  successful  growth.  The  lack  of  strong 
growth  was  attributed  to  nuisances  within  the  ARES  system.  The  first  run  is  within  the 
original  design  region  and  is  typically  used  as  a  confirmation  mn,  so  the  lack  of  true 
growth  data  is  not  much  of  a  concern.  The  remainder  of  the  results  clearly  shows  that  the 
stopping  conditions  are  achieved  and  third  run  contains  the  maximum  response. 
However,  it  is  possible  that  the  first  or  second  run  would  have  been  the  actual  maximum 
if  reliable  growth  data  was  available. 

The  search  experiments  were  performed  on  a  different  patch  than  the  first-order 
design  experiments  which  could  explain  why  the  response  values  are  much  lesser  in 
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magnitude  than  before.  The  lesser  response  values  could  represent  that  the  search  path 
traveled  in  the  wrong  direction,  but  this  is  unlikely  due  to  the  high  significance  and 
insensitivity  of  the  linear  model.  The  distance  between  the  second  and  third  runs  is  very 
close,  so  there  is  not  much  impact  by  selecting  the  third  run  as  the  maximum  response 
over  the  second  run. 

4.3  Second  First-Order  Design  and  Proposed  Search 

The  next  stage  of  the  RSM  process  is  to  experiment  with  another  first-order 
design  centered  on  the  location  of  the  third  run  from  the  search  phase.  Due  to  several 
issues  with  the  ARES  system  that  caused  unsuccessful  experiments,  the  RSM  process 
required  a  restart  to  correct  the  status  and  data  files.  The  maximum  location  from  the 
search  was  inserted  as  the  user-specified  initial  location,  but  with  some  slight  rounding 
corrections.  The  factor  level  sizes  and  boundaries  remained  the  same.  Table  4.5  displays 
the  second  coded  first-order  design  and  the  response  results.  The  experimentation  of  this 
design  also  had  multiple  issues  with  unsuccessful  runs.  Due  to  the  limitations  in  time 
and  experiments,  the  blocking  principle  was  disregarded  to  have  the  ability  to  obtain 
four  additional  data  points.  If  the  planned  run  order  is  different  from  the  actual  run 
order,  it  is  shown  in  parentheses  in  Table  4.5.  The  planned  and  actual  blocks  are 
displayed  in  the  same  format. 
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Table  4.5.  Coded  First-Order  Design  2  and  Results 


Run 

Order 

Ratio 

1 

Ratio 

2 

Pres. 

Temp. 

Water 

Cone. 

Resp. 

(v) 

Block 

Notes 

1 

0 

0 

0 

0 

0 

76.29 

1 

Good  fit 

2 

1 

-1 

-1 

1 

1 

47.16 

1 

Good  fit 

3 

0 

0 

0 

0 

0 

56.80 

1 

Good  fit 

4 

1 

1 

-1 

-1 

1 

57.89 

1 

Good  fit 

5 

-1 

-1 

-1 

-1 

1 

47.59 

1 

Good  fit 

6 

-1 

-1 

1 

-1 

-1 

39.02 

1 

Good  fit 

7 

1 

-1 

1 

1 

-1 

45.35 

1 

Good  fit 

8 

-1 

1 

1 

1 

-1 

31.71 

1 

Good  fit 

9(12) 

0 

0 

0 

0 

0 

97.33 

2 

Weak  fit 

10  (13) 

1 

-1 

1 

-1 

1 

237.67 

2 

Good  fit 

11 (14) 

1 

1 

-1 

1 

-1 

23.35 

2 

Good  fit 

12  (15) 

-1 

1 

1 

-1 

1 

45.00 

2 

Good  fit 

13  (16) 

0 

0 

0 

0 

0 

34.50 

2 

Okay  fit 

14  (17) 

-1 

-1 

-1 

1 

-1 

114.96 

2 

Good  fit 

15  (19) 

-1 

1 

-1 

-1 

-1 

16.82 

2 

Good  fit 

16  (9) 

1 

1 

1 

-1 

-1 

16.00 

3(1) 

Estimation 

17  (10) 

-1 

1 

-1 

1 

1 

41.00 

3(1) 

Estimation 

18 (11) 

1 

1 

1 

1 

1 

42.00 

3(2) 

Estimation 

19 (18) 

-1 

-1 

1 

1 

1 

38.00 

3(2) 

Estimation 

20 

1 

-1 

-1 

-1 

-1 

61.65 

2 

Prediction 

The  first  block  included  eight  out  of  ten  successful  experiments  and  the  second 
block  included  seven  out  of  ten  successful  experiments.  All  five  of  the  unsuccessful 
experiments  are  factorial  runs.  The  loss  of  five  factorial  runs  could  significantly  impact 
the  linear  model  estimates,  so  four  of  these  runs  were  executed  in  a  separate  block.  The 
last  run  was  not  obtainable,  so  a  predicted  response  value  is  used  from  a  linear  model 
created  with  the  19  successful  runs.  Again,  better  techniques  may  exist  to  replace  a 
missing  value,  but  this  approach  was  used  as  a  quick  solution  on  the  ARES  system.  The 
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additional  four  runs  were  executed  on  the  same  patch  as  the  second  block  runs.  The 
researcher  manually  estimated  the  initial  growth  rates  for  these  four  runs. 

The  linear  model  of  the  second  first-order  design  is  displayed  in  Table  4.6. 
Normally,  the  planner  executes  the  lack  of  fit  test  before  determining  if  another  linear 
model  is  necessary,  but  it  is  calculated  simultaneously  with  the  linear  model  for  the 
purpose  of  this  analysis.  The  lack  of  fit  test  effect  estimate  is  negative,  so  there  is  no 
indication  that  the  current  experimental  region  is  a  local  maximum.  Only  Ratio  2  is  a 
significant  effect  estimate  in  this  second  linear  model.  The  total  pressure  is  no  longer  a 
significant  variable,  so  within  the  current  region  of  experimentation  the  response  is  no 
longer  affected  by  changes  to  this  variable.  Pressure  may  become  significant  after  more 
searches,  but  current  maximization  of  this  variable  demonstrates  the  effectiveness  of  the 
search  process.  The  Ratio  2  coefficient  is  opposite  from  the  previous  direction  which  is 
likely  due  to  the  significant  interaction  term  in  the  previous  linear  model.  No  interaction 
terms  are  significant  for  this  model,  so  the  true  path  of  steepest  ascent  should  now  lack 
curvature. 


Table  4.6.  Linear  Model  2  Analysis  and  Lack  of  Fit  Test 


Parameter 

Estimate 

t  Ratio 

Prob.  >  Itl 

Intercept 

3.9582 

25.07 

<.0001 

Ar  /  C2H4 

0.0795 

0.56 

0.5832 

h2  /  C2H4 

-0.3531 

-2.5 

0.0265 

Pressure 

0.0021 

0.01 

0.9886 

Temperature 

-0.0294 

-0.21 

0.8382 

Water  Concentration 

0.2246 

1.59 

0.1357 

Lack  of  Fit  Test 

-0.1651 

-1.05 

0.3146 
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There  was  not  enough  time  to  conduct  further  experimentation  and  provide 
results  for  the  second  search  process  in  this  study.  The  planned  search  design  for  ten 
runs  is  displayed  in  Table  4.7.  It  is  clear  from  the  search  design  that  the  primary 
objective  is  the  optimization  of  the  hydrocarbon  gas  mixture  blend.  The  pressure  barely 
increases  and  the  temperature  slightly  decreases.  The  temperature  was  originally  set  at 
725  degrees,  so  it  possible  that  a  true  optimal  exists  somewhere  in  between  725  and  750 
degrees. 


Table  4.7.  Search  Process  2  Design 


Ratio 

1 

Ratio 

2 

Pressure 

(Torr) 

Temp. 

(Celsius) 

Water 

(ppm) 

Search  A 

0.019 

-0.083 

0.01 

-2.76 

0.00 

Base 

0.879 

1.182 

15.00 

750.00 

3.00 

Run  1 

0.897 

1.099 

15.01 

747.24 

3.00 

Run  2 

0.916 

1.016 

15.03 

744.49 

3.00 

Run  3 

0.935 

0.934 

15.04 

741.73 

3.00 

Run  4 

0.953 

0.851 

15.06 

738.98 

3.00 

Run  5 

0.972 

0.768 

15.07 

736.22 

3.00 

Run  6 

0.990 

0.686 

15.09 

733.47 

3.00 

Run  7 

1.009 

0.603 

15.10 

730.71 

3.00 

Run  8 

1.028 

0.520 

15.12 

727.96 

3.00 

Run  9 

1.046 

0.438 

15.13 

725.20 

3.00 

Run  10 

1.065 

0.355 

15.14 

722.45 

3.00 

The  search  design  in  terms  of  the  actual  mixture  variables  in  displayed  in  Table 
4.8.  The  search  increases  argon  at  the  expense  of  hydrogen,  while  also  slowly  increasing 
ethylene.  The  research  expert  agreed  that  lower  allocations  of  hydrogen  could  improve 
growth  results  based  on  previous  experience  (Nikolaev  et  al.,  2015). 
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Table  4.8.  Search  Process  2  Design  for  Mixture  Variables 


Argon 

(%) 

Ethylene 

(%) 

Hydrogen 

(%) 

Base 

0.287 

0.327 

0.386 

Run  1 

0.299 

0.334 

0.367 

Run  2 

0.312 

0.341 

0.347 

Run  3 

0.326 

0.349 

0.326 

Run  4 

0.340 

0.357 

0.303 

Run  5 

0.355 

0.365 

0.280 

Run  6 

0.370 

0.374 

0.256 

Run  7 

0.386 

0.383 

0.231 

Run  8 

0.403 

0.392 

0.204 

Run  9 

0.421 

0.403 

0.176 

Run  10 

0.440 

0.413 

0.147 

4.4  Laser  Temperature  Calibration  Concerns 

During  the  experimentation  process,  concerns  arose  about  the  laser  temperature 
calibration.  The  actual  calibrated  temperatures  are  regularly  much  different  from  the 
planned  temperatures.  The  effect  estimate  for  temperature  is  highly  dependent  on 
assumption  that  the  planned  and  actual  temperatures  are  close  to  equivalent.  However, 
analysis  of  the  linear  models  using  true  calibrated  temperatures  rather  than  planned 
temperatures  displayed  no  significant  impact  on  effect  estimates.  The  calibrated 
temperatures  are  usually  greater  than  the  planned  temperatures,  so  a  discrete  offset  may 
exist  between  planned  and  true  temperature  values.  The  planned  and  actual  temperature 
settings  are  displayed  in  Table  4.9.  The  average  difference  between  planned  and  actual 
temperatures  is  approximately  41  degrees.  The  two  sets  have  a  correlation  of 
approximately  0.63.  The  planner  could  be  designed  in  the  future  to  incorporate  the  true 
temperatures,  but  this  compromises  the  orthogonality  of  the  design. 
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Table  4.9.  Planned  and  Actual  Laser  Temperatures 


Run 

Order 

Planned 

Temp. 

(Celsius) 

Actual 

Temp. 

(Celsius) 

Temp. 

Difference 

(Celsius) 

1 

750 

782.92 

32.92 

2 

790 

828.37 

38.37 

3 

750 

770.78 

20.78 

4 

710 

785.53 

75.53 

5 

710 

780.19 

70.19 

6 

710 

773.89 

63.89 

7 

790 

851.45 

61.45 

8 

790 

877.80 

87.80 

9 

750 

844.01 

94.01 

10 

710 

763.71 

53.71 

11 

790 

805.09 

15.09 

12 

710 

713.31 

3.31 

13 

750 

809.33 

59.33 

14 

790 

768.11 

-21.89 

15 

710 

789.47 

79.47 

16 

710 

720.00 

10.00 

17 

790 

800.00 

10.00 

18 

790 

790.00 

0.00 

19 

790 

820.00 

30.00 
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V.  Conclusion  and  Future  Research 


This  chapter  discusses  the  main  conclusions  from  this  study.  The  conclusions  are 
based  on  the  verification  of  the  Response  Surface  Methodology  (RSM)  experiment 
planner,  the  performance  of  the  autonomy  on  the  Adaptive  Rapid  Experimentation  and 
Spectroscopy  (ARES)  system,  and  the  optimization  of  the  carbon  nanotube  growth 
response.  This  chapter  includes  recommendations  on  future  carbon  nanotube 
experimentation  and  RSM  models.  Lastly,  ideas  for  future  research  on  this  topic  are 
presented  in  this  chapter. 

5.1  Summary  of  Conclusions 

The  planner’s  ability  to  execute  the  RSM  process  was  verified  through  pretesting 
and  with  data  from  the  true  system.  The  results  of  the  analytical  techniques  are  verified 
by  matching  results  from  the  planner  with  the  data  analysis  from  Chapter  4.  The 
planner’s  algorithms  correctly  identified  the  appropriate  status  indicators  and  completed 
the  necessary  actions  as  designed.  Input  and  output  data  storage  operated  effectively  on 
the  ARES  system. 

Many  difficulties  of  conducting  fully  autonomous  experimentation  were  not 
identified  until  after  the  RSM  planner  was  implemented  into  the  ARES  system.  The 
planner  does  not  currently  have  the  capability  to  perform  well  when  the  rate  of 
unsuccessful  experiments  is  high.  The  status  indicator  technique  updates  after 
experiments  are  planned,  but  the  planner  does  not  have  a  means  of  knowing  if  the 
experimentation  failed  due  to  the  one  of  the  many  nuisances  within  ARES.  During  the 
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supervised  experimentation,  the  status  and  data  files  were  manually  corrected  to  resume 
the  RSM  process  at  the  appropriate  stage.  The  ARES  system  does  not  currently  have  the 
capability  to  retest  the  experiments  in  the  planner  file  without  accessing  the  planner.  The 
researchers  control  what  response  results  are  inputted  into  the  planner,  so  full  autonomy 
is  expected  as  long  as  the  appropriate  results  are  provided.  The  current  performance  of 
autonomy  is  acceptable,  but  can  greatly  improve  with  the  incorporation  of  the  future 
research  ideas  presented  later  in  this  chapter. 

Results  from  the  actual  RSM  experimentation  revealed  that  the  optimization 
process  is  operating  as  expected.  The  first  search  process  identified  multiple  significant 
effects  on  the  initial  growth  rate.  This  search  appears  to  have  identified  a  potential 
optimal  setting  for  total  pressure  at  15  Torr.  Currently,  the  process  is  primed  for  a 
second  search  process  that  aims  to  find  the  optimal  blend  of  mixture  variables.  No 
interaction  effects  are  significant  for  the  second  search,  so  the  path  should  follow  the 
true  response  gradient  better  than  the  first  search  process.  The  only  major  experimental 
concern  is  the  temperature  setting,  because  of  the  inaccuracy  of  the  laser  temperature 
calibration.  However,  the  effect  estimates  for  temperature  do  not  seem  largely  affected 
by  the  inaccuracy.  The  second  search  process  may  also  pinpoint  the  optimal  planned 
temperature  setting. 

5.2  Recommendations 

The  first  few  experiments  in  a  set  seem  to  have  a  higher  rate  of  unsuccessful 
growth  than  the  rest  of  the  design.  The  researchers  do  not  always  perform  trial  or  warm¬ 
up  runs  on  the  ARES  system  to  help  ensure  that  the  critical  experiments  are  exhibiting 
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growth.  A  simple  recommendation  is  to  perform  “warm-up”  runs  before  serious 
experimentation.  The  system  can  reach  a  steady-state  performance  and  hopefully 
provide  a  higher  rate  of  successful  experiments.  The  loss  of  results  for  several 
experiments  can  significantly  impact  the  results  of  the  regression  models  and  the  lack  of 
fit  test. 


The  planners  installed  on  ARES  are  the  only  methods  available  to  experiment 
with  multiple  runs.  If  several  experiments  failed  in  a  set,  the  researchers  do  not  have  an 
ability  to  immediately  retest  the  unsuccessful  runs.  Also,  if  the  entire  experiment  set 
fails,  the  experimental  design  that  is  already  written  to  the  planner  file  cannot  be 
retested.  The  first  recommendation  to  solve  this  issue  is  to  develop  an  interface  where 
multiple  experiments  can  be  entered  manually.  The  warm-up  runs  can  be  planned 
through  this  interface,  as  well.  The  second  recommendation  is  to  add  the  capability  to 
initiate  the  retesting  of  the  design  already  in  the  planner  file. 

Before  implementing  the  planner  on  the  ARES  system,  the  planner’s  algorithms 
and  analytical  techniques  were  tested  using  a  multivariate  normal  distribution.  This 
distribution  is  easy  to  code  within  any  software  language.  The  multivariate  normal 
distribution  only  has  one  peak,  so  it  is  easy  to  interpret  the  results  and  debug  the 
software.  For  similar  problems  that  involve  creating  an  RSM  computer  program,  the 
multivariate  normal  distribution  is  an  effective  way  to  verify  the  algorithms  and 
techniques. 
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5.3  Future  Research  in  this  Area 


Advanced  Techniques  in  C# 

As  mentioned  in  previous  chapters,  the  planner  is  limited  in  several  areas  due  to 
the  short  amount  of  time  allotted  to  develop  mathematical  techniques  in  C#.  A  function 
to  find  the  eigenvalues  of  a  matrix  will  improve  the  canonical  analysis.  Additional 
coding  is  required  within  the  search  process  to  add  the  contact  point  experiment  to  the 
boundary  technique  algorithm.  The  assumptions  of  the  regression  models  are  not 
validated  within  the  planner.  It  is  possible  that  certain  algorithms  could  analyze  the 
residuals  of  the  model  and  grade  the  model’s  ability  to  meet  assumptions.  Lastly,  more 
advanced  data  storage  techniques  could  assist  in  saving  and  applying  additional  RSM 
data  such  as  different  experimental  designs. 

Increase  Robustness  of  Status  System 

The  status  indicator  technique  advances  through  the  RSM  process  based  on  the 
assumption  that  experiments  are  successful.  The  status  always  advances  to  the  next  stage 
after  experiments  are  planned.  This  occurs  even  when  the  set  of  experiments  is  widely 
unsuccessful.  The  status  system  can  improve  with  additional  algorithms  that  analyze 
response  results  for  unsuccessful  experiments.  Afterwards,  the  planner  can  either  create 
models  with  only  the  successful  data  or  decide  to  retest  the  unsuccessful  experiments. 
This  status  system  improvement  will  also  require  more  thought  on  how  response  results 
are  provided  to  the  planner.  The  response  results  must  be  listed  in  the  appropriate  order, 
so  if  an  experiment  is  unsuccessful  some  sort  of  placeholder  should  be  used  to  annotate 
the  problem. 
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Additional  Functionality  to  Research  Different  Problems 


The  planner  is  specifically  created  for  the  current  carbon  nanotube  growth 
research  problem.  The  planner  can  experiment  on  fewer  factors,  but  not  does  update  the 
experimental  design  to  the  most  efficient  for  that  situation.  With  additional  functionality, 
the  user  of  the  planner  can  select  the  amount  of  mixture  and  process  variables  to  execute 
RSM  on  any  research  problem.  Also,  researchers  have  other  response  variables  of 
interest,  so  the  planner  can  evolve  to  a  multi-objective  response  optimization.  The 
planner’s  software  is  adaptable  for  any  experimental  system  that  requires  response 
optimization. 

5.4  Closing  Remarks 

The  AFRL  researchers  provided  feedback  that  supported  many  aspects  of  the 
RSM  planner.  The  researchers  are  pleased  with  maximization  of  gas  variable  flow  rates 
and  various  user  interface  menus.  The  optimization  capability  is  highly  desired  and 
appears  on  track  to  produce  significant  findings.  There  is  an  interest  to  incorporate  the 
ARES  software  into  other  systems  at  the  Air  Force  Research  Laboratory.  The  RSM 
planner  software  is  likely  to  accompany  ARES  and  be  adapted  to  optimize  critical 
responses  of  other  research  problems.  Research  will  continue  on  the  autonomy  aspect  of 
the  planner.  The  primary  goal  is  to  eliminate  the  researcher’s  need  to  make  difficult 
decisions  regarding  experiment  plans. 
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Appendix  A 


Table  A.l.  List  of  Status  Indicators  and  Explanations 


Status  Indicator 

Explanation 

Start 

New  RSM  process  or  after  completion 

FO  Full 

Entire  first-order  design  planned 

FO  Block  1 

First-order  design  Block  1  planned 

FO  Block  2 

First-order  design  Block  2  planned 

Search 

Search  process  runs  planned 

Continue  Search 

Resume  search  process  (search  menu  cancel) 

Pre  Search 

Start  search  process  (lack  of  fit  menu  cancel) 

Pre  Axial 

Start  second-order  design  (lack  of  fit  menu  cancel) 

Second  Order 

Second-order  design  augmentation  planned 
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Appendix  B 


Table  B.l.  List  of  Data  Files  and  Explanations 


Data  File 

Explanation 

Catalyst 

Name  of  the  catalyst  for  the  current  process 

Current  Location 

Current  search  location  or  center  of  first-order  design 

First  Order  Coded 

Current  coded  first-order  design 

First  Order  Response 

Current  first-order  design  response  values 

Full  Levels 

Factor  level  boundaries  from  the  initialization 

Initial  Start 

Initial  search  location  from  the  initialization 

Level  Size 

Initial  factor  level  sizes  from  the  initialization 

Number  of  Models 

Tracks  the  number  of  first-order  models 

Planner 

File  that  experiments  are  written  to  -  main  ARES  input 

Previous  Results 

Stores  previous  RSM  process  results 

Response 

File  that  results  are  saved  to  -  main  planner  input 

Search  Coded 

Coded  design  of  search  experiments 

Search  Deltas 

Search  gradient  vector 

Search  Response 

Array  of  search  experiment  results 

Second  Order  Coded 

Coded  design  of  axial  runs 

Status 

Status  indicator  files 
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